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ABSTRACT 


An  algorithm  is  developed  for  estimating  the  state  of  a  linear  dy- 
namic system  excited  by  a  random  sequence.   The  input  data  are  noisy 
observations  which  are  nonlinear  functions  of  the  state.   The  estimates 
are  best  in  the  sense  of  least  squared  residuals.  A  significant  problem 
in  radar  tracking  is  investigated  and  the  effectiveness  of  the  algorithm 
verified. 
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1.    Introduction. 

The  problem  to  be  considered  is  that  of  state  estimation  where  the 
observations  are  nonlinear  functions  of  the  state  of  the  system  corrupted 
by  additive  white  noies.   The  equations  of  motion  of  the  state  are  lin- 
ear in  the  state  and  the  excitation  which  includes  random  components.   A 
significant  constraint  on  the  solution  is  that  the  conceptual  solution 
to  this  problem  must  lead  to  an  explicit  procedure  which  can  be  realized 
on  a  digital  computer  and  the  scheme  must  produce  estimates  as  the  obser- 
vations are  received. 

The  theory  for  the  case  where  system  dynamics  and  observation  func- 
tions are  linear  is  very  highly  developed.   However,  when  nonlinear it ies 
are  introduced  there  are  virtually  no  completely  satisfactory  solutions. 

Two  methods  for  handling  the  nonlinear  problem  have  been  introduced. 
The  first  entails  a  linearization  about  a  nominal  trajectory  in  state 
space.   Its  success  depends  upon  the  accuracy  of  the  nominal  trajectory. 
This  technique  has  little  hope  of  success  in  a  situation  where  there  is 
almost  no  prior  information  about  the  trajectory.   The  target  acquisition 
problem  is  an  example  of  such  a  situation. 

The  second  method  is  more  of  theoretical  interest  than  as  a  candi- 
date for  a  computational  procedure.  In  this  development  the  viewpoint  is 
taken  that  the  output  of  a  state  estimator  should  be  the  conditional  pro- 
bability density  function,  conditioned  upon  all  past  data.  Computational 
difficulties  arise  from  an  effort  to  compute  a  complete  function  over  the 
entire  state  space  as  compared  to  a  more  conventional  estimator  which 
selects  a  single  point  in  the  state  space  as  the  most  likely  state. 

This  study  was  particularly  motivated  by  the  difficulties  encountered 
in  filtering  radar  returns  from  an  airborne  target.   The  target  dynamics 


are  presumably  describable  by  a  linear  dynamic  system  where  the  elements 
of  the  state  vector  are  the  position  and  velocity  of  the  target  in  car- 
tesian coordinates  measured  from  the  radar.   On  the  other  hand,  data  avail- 
able to  a  filter  from  the  radar  will  usually  be  in  spherical  coordinates. 
Thus  there  exists  a  known,  nonlinear  relationship  or  transformation  be- 
tween the  state  of  the  system  and  observations.  The  results  of  a  series 
of  experiments  using  a  filter  based  upon  a  simple  linearization  procedure 
are  reported  in  Demetry  and  Hudson  [4].  The  operation  of  the  filter  was 
unsatisfactory  under  realistic  conditions  of  initial  uncertainty  about 
target  position,  i.e.,  where  to  evaluate  the  partials  involved  in  the 
linearization  procedure. 

The  present  work  fills  a  gap  in  the  field  of  nonlinear  estimation  for 
problem  in  which  there  is  little  prior  information  and  a  computationally 
feasible  estimation  procedure  is  required. 

The  problem  is  discussed  and  precisely  formulated  in  Section  2.  The 
original  filtering  problem  is  replaced  by  an  associated  minimization  prob- 
lem. The  work  of  Kalman  is  reviewed  since  it  is  known  that  the  Kalman 
filter  is  the  solution  to  the  associated  minimization  problem  when  the 
observations  are  linear  functions  of  the  states.  A  special  single-stage 
form  of  the  problem  is  considered.   It  is  shown  that  the  nonlinear  obser- 
vation problem  can  be  approximated  by  a  linear  problem  whose  solution  is 
known  from  Kalman' s  work.   The  resulting  solution  is  then  used  to  generate 
a  new  approximation  to  the  original  problem.   Each  iteration  of  the  above 
process  reduces  the  function  to  be  minimized  so  long  as  the  gradient  is 
non-zero.   It  is  shown  that  the  whole  class  of  problems  originally  con- 
sidered can  be  cast  in  the  special  single-stage  form. 

The  problem  may  be  said  to  be  solved  at  this  point;  however,  for  real- 
time calculations  there  must  be  procedures  for  controlling  the  number 
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of  iterations.   Iteration  control  procedures  are  discussed  along  with 
heuristic  means  for  choosing  the  control  parameters.   Even  with  the  num- 
ber of  iterations  kept  to  a  finite  number,  the  computing  requirements 
would  increase  indefinitely  since  each  new  set  of  data  generates  a  new 
minimization  problem  over  a  larger  number  of  variables.   The  control  of 
this  effect  is  discussed  by  introducing  the  concept  of  noise  generated 
by  the  nonlinearities.  The  overall  algorithm  is  discussed  with  respect 
to  implementation  on  a  digital  computer. 

The  method  was  used  on  a  realistic  radar  tracking  problem.   The 
models  for  the  target  and  the  radar  are  discussed.   The  results  of  the 
study  indicate  that  the  method  produces  reasonable  estimates  of  the  states 
of  the  system. 


11 


2.   Detailed  Statement  of  the  Problem. 

The  specification  of  the  problem  may  be  viewed  as  consisting  of  four 
parts.   Three  of  these  are  different  types  of  information  about  the  system 
under  observation  and  the  last  is  an  implicit  statement  of  how  the  avail- 
able information  should  be  combined  to  form  an  estimate  of  the  state  of 
the  system. 

The  first  type  of  information  about  the  system  is  expressed  by  form- 
ing a  dynamic  model  of  the  system.   In  this  way  the  relation  between 
states  at  different  times  is  made  explicit.   It  is  only  through  the  dy- 
namic relation  of  the  states  that  observations  taken  at  diverse  times 
have  any  relation  to  one  another.   In  general  usage  the  term  filter  is 
associated  with  a  sequence  of  observations  which  are  related  to  one 
another  (correlated).  The  states  and  the  dynamic  model  embody  this  cor- 
relation. 

The  second  type  of  information  is  the  relationship  of  the  observa- 
tion at  a  given  time  to  the  state  at  the  same  time. 

The  third  type  of  information  is  the  a  priori  knowledge  about  the 
state  of  the  system. 

Finally   the  best  estimate  of  the  state  is  defined.   In  general,  none 
of  the  information  about  the  state  is  definitive;  it  is  all  subject  to 
some  uncertainty  and  except  for  this  uncertainty,  it  would  be  contra- 
dictory. The  best  estimate  is  defined  to  effect  a  certain  compromise 
among  all  of  the  available  information. 

Since  the  resulting  definition  is  implicit,  computational  difficul- 
ties occur.   The  resolution  of  these  difficulties  results  in  an  explicit 
procedure  for  producing  an  estimate  which  is  almost  best  within  a  reason- 
able computing  time.   Such  an  explicit  procedure,  or  algorithm,  will  be 
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called  a  filter. 

The  Dynamic  Model 
The  time  evolution  of  the  states  of  the  system  which  is  being  observed 
are  assumed  to  be  adequately  described  by  a  difference  equation  which  is 
linear  in  both  the  state  and  the  excitation.   The  excitation  is  assumed  to 
be  a  white  random  time  sequence  which  has  known  first  and  second  moments 
and  is  independent  of  the  state.   If  the  mean  of  the  random  excitation  is 
not  zero,  then  the  random  signal  can  be  decomposed  into  a  deterministic 
component  and  a  zero-mean  random  component.   The  development  will  assume 
that  there  is  no  deterministic  component  (or  non-zero  mean) .   A  short  com- 
ment will  be  made  at  the  appropriate  point  outlining  the  required  changes 
for  the  case  of  deterministic  inputs.   These  notions  are  concisely  stated 
in  (1)  through  (4). 

*(k+l)  -  $jr(k)  +  T/U)(k)  (1) 

Eru)(k)]  ^  0  (2) 

***)  .u)T]  -  {I  £  Jjj  o) 

E[w(k)  x(j)T]  =  0  for  k£j  (4) 

where  x   is  the  state  vector  of  n  components, 

r  is  a  known  n  x  r  input  distribution  matrix, 

u)   is  a  random  excitation  of  r  components, 

<f   is  a  known  r  x  n  state  transition  matrix, 

Ef  ]  is  the  expectation  operation,  and 

T   denotes  the  transpose. 
Equation  (1)  expresses  the  linearity  of  the  state  dynamics  while  (2),  (3), 
and  (4)  express  the  qualities  of  zero  mean,  whiteness,  and  independence 
respectively  about  the  excitation.   The  assumption  that  the  covariance 
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of  <u(k)  is  the  identity  matrix  involves  no  loss  of  generality  as  long  as 
r  is  appropriately  chosen.   Consider  the  two  random  excitations  Tu)  and 
r'u)'  where 

E[u>]  «  0 

ElV]  =   0 

E|"u>  U)T]  s   I 

T 

E[u>f   U)'    ]  =   Q 

By  comparing  first  and  second  moments,  the  two  random  excitations  are 

T         T 
equivalent  if  T  T    =  T'OT   •   Q  is  a  covariance  matrix  and  thus  is  sym- 
metric and  positive  semi-definite.   This  implies  that  a  decomposition  can 

T 
be  found  such  that  B  B  =  Q,  and  r  ■  T'B. 

The  Observations 
The  data  available  to  the  filter  are  nonlinear  functions  of  the 
state  corrupted  by  additive  white  noise.   The  functional  relation  is 
assumed  to  be  twice  differentiable  in  the  state.  The  corrupting  noise 
is  assumed  to  have  zero  mean  and  known  variance.  The  noise  is  assumed 
to  be  independent  of  the  states  and  the  excitation.   These  notions  are 
concisely  expressed  as 

*(k)  =  hCe(k))  +  u(k)  (5) 

El>(k)]  -  0  (6) 

*»«•«>*]- ft  £  Si  <7> 

E[u(k)  *(j)T]  =  0  (8) 

E!"u(k)  u)(j)T]  «  0  (9) 

where  z  (k)  is  an  m  vector  of  observations  at  time  k, 
u(k)  is  an  m  vector  of  noise  at  time  k, 
h(  )  is  an  m  vector  of  nonlinear  functions  of  the  states 
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A         is  the  ra  x  m  covariance  matrix  of  the  measurement  noise. 

A  Priori  Information 
In  view  of  (1)  any  information  relative  to  jr(k)  must  also  be  consid- 
ered when  estimating  JC(k-H) .   In  conventional  linear  sequential  stage-by- 
stage  estimation  one  can  consider  two  distinct  phases.   The  first  is  to 
bring  forward  all  information  from  the  past  observations  in  the  form  of 
an  a  priori  estimate  using  (1).   The  second  phase  is  then  to  adjust  this 
a  priori  estimate  in  view  of  the  actual  observations  of  the  state.   This 
is  repeated  from  stage  to  stage.   Clearly  this  process  must  start  with  a 
given  a  priori  estimate  for  the  first  state.   Sometimes  this  a  priori 
estimate  serves  merely  as  a  mathematical  convenience  for  starting  the  fil- 
ter [9].   In  other  cases  an  appropriate  a  priori  estimate  is  really  avail- 
able.  In  any  case  an  a  priori  estimate  takes  the  form  of  an  estimate  of 
the  initial  state  coupled  with  a  measure  of  the  accuracy  of  this  estimate, 
the  covariance  of  the  error,  defined  as 

E[(r(l)-r(l/0))  (*(1/0))T]  s  P(l/U)  (10) 

E[.r(l)-0f(l/0)]  »  0  (11) 

where  Jf(l/U)  is  the  a  priori  estimate  of  JP(1),  and 

P(l/0)    is  the  covariance  of  the  a  priori  estimate. 
The  double  index  argument  will  be  used  throughout  to  indicate  an  estimate 
of  the  state  associated  with  the  first  index  based  on  observations  up  to 
and  including  those  associated  with  the  second  index. 

Definition  of  Best  Estimate 
The  best  estimate  of  sequence  of  states  x(l)    through  r(k)  will  be 
called  J?(l/k)  through  j-(k/k)  and  is  defined  as  that  sequence  which  mini- 
mizes the  scalar  quantity 
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CO<l/k),  *(2/k),  ...,  *(k/k)]  -  ||  *(l/k)  -  x(VO)   ||J 

k 
+  ^||  x(i/k)  -**(i-l/k)  (J 
is2  2 

k 
♦  ^||  *(i)  -  hC*(i/k))||£  (12) 

i=l  3 

where  the  norm  notation  is  introduced  for  compactness.   By  definition 

||b||   is  equivalent  to  b  Ab  and  the  result  is  a  scalar  which  is  a  quad- 

A 

ratic  function  of  the  elements  of  b. 

A  best  estimate  defined  in  this  manner  might  be  called  a  weighted- 
least-squares  estimate  generalized  to  include  the  case  where  the  quantity 
to  be  estimated  is  changing  somewhat  randomly  in  time. 

The  three  different  types  of  terms  correspond  to  (10),  (1),  and  (5) 
respectively.  The  terms  inside  the  vertical  bars  are  called  residuals. 
The  residuals  are  the  difference  between  the  expected  value  of  a  function 
of  the  true  states  and  that  same  function  evaluated  at  the  estimate  of 
the  state. 

This  definition  of  the  best  estimate  can  be  interpreted  in  several 
ways  which  will  be  developed  below.  These  interpretations  cannot  in  any 
sense  prove  that  estimates  defined  in  this  way  are  best.  The  most  that 
can  be  hoped  for  is  that  the  interpretations  offered  will  enhance  the 
reasonableness  of  the  resulting  estimate.   It  must  be  realized  that  the 
best  estimate  is  only  what  it  is  defined  to  be,  which,  in  the  final  anal- 
ysis  is  certainly  somewhat  arbitrary. 

First,  if  all  of  the  random  sequences  are  assumed  to  be  sequences  of 
Gaussian  (or  normal)  random  variables,  then  it  is  possible  to  compute  the 
probability  of  the  observed  data  for  a  given  sequence  of  states.  This 
probability  is  viewed  as  a  function  of  the  true  states.  That  sequence  of 
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the  states  which  maximizes  the  probability  is  taken  as  an  estimate  of 
the  true  states.   The  probability  is  commonly  called  the  likelihood  and 
can  be  expressed  as 

L(#(l),  *(2),  ...,  X(k))  -  K  exp  [-  |  C(*(l),  x(2)  ,  ....  *(k))] 
where  K  is  a  constant  which  is  independent  of  the  states  if  the  weighing 
matrices  are  chosen  as 

Wt  -  -PC1/0)"1  (13) 

W2  -  (riV1  (14) 

W3  -Z?'1  (15) 

The  covariances  are  assumed  to  be  nonsingular  for  simplicity.   For 
a  detailed  discussion  of  the  case  of  singluar  covariance  matrix  see 
Appendix  I.   It  is  clear  that  to  maximize  L  one  must  minimize  C.   Thus 
for  the  case  of  Gaussian  random  variables  the  best  estimate  will  be  the 
so  called  maximum  likelihood  estimate. 

A  second  point  of  view  is  that  it  would  be  desirable  to  find  an 
estimate  which  resulted  in  zero  residuals;  requiring  zero  residuals,  how- 
ever, would  imply  a  larger  number  of  constraints  than  there  are  adjustable 
parameters  (estimates).   This  suggests  that  the  best  estimate  would  be 
some  sort  of  compromise  where  all  of  the  residuals  are  small.   Such  a  com- 
promise is  effected  by  setting  up  a  weighted  sum  of  squares  of  the  re- 
siduals, C,  as  a  function  of  the  estimates,  and  selecting  (or  defining 
in  this  case)  that  set  of  estimates  which  minimize  C  as  the  best  esti- 
mate. 

In  general,  the  best  estimate  will  depend  upon  the  weighting  chosen 
for  each  residual.  In  order  to  determine  the  appropriate  weighting  mat- 
rices it  is  helpful  to  consider  under  what  circumstances  equal  weighting 
would  be  appropriate.   A  heuristically  reasonable  answer  might  be  to 
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weight  equally  when  each  of  the  random  components  have  the  same  variance. 
If  each  of  the  sets  of  equations  are  multiplied  by  appropriate  matrices 
new  random  variables  can  be  defined  so  that  each  has  unit  variance.  The 
residuals  from  these  new  equations  are  computed  and  their  squares  all 
weighted  equally.   In  this  form  the  residuals  have  an  intuitively  rea- 
sonable weighting.   But  it  can  be  shown  that  this  is  equivalent  to 
weighting  the  original  residuals  with  the  inverse  of  the  corresponding 
covariance. 

A  very  simple  example  should  clarify  the  argument.   Consider  the 
problem  of  estimating  x   from  two  observations 

Zx  =  h1Cxr)  +  Vx 

22  =  h20?)  +  v2 

where  EfU,]  ■  0, 

E!"l>2]  =  0, 

EruJ]  *  o\> 

EfU^  a22- 
Dividing  the  first  equation  by  o,  and  the  second  by  a2  yields 

*l/ai  "  hi(,r)/ai  +  ui 

and  *2^a2  "  h2(jp^/a2  +  U2 

where  u,  and  u.  are  unit  variance  random  variables.   So  the  sum  of  squared 
residuals  is  formed 

cCx)  -  (f1hl  -  h^  )hx)2  +  u2/a2  -  h20c)/a2)2 

but   this   is   equivalent   to 

C(r)  «   G^-h^J))2/***  (m2  -  h2(xr))2/a2     . 
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Therefore,  If  it  is  reasonable  to  weight  equally  residuals  associated 
with  random  variables  of  equal  variance  then  it  follows  for  unequal  vari- 
ances the  weighting  should  be  in  inverse  proportion  to  the  variance. 

Another  point  of  view  is  that  the  weighting  matrices  W  ,  W  and  W 
are  chosen  directly  without  recourse  to  any  assumed  random  variables. 
There  is,  of  course,  an  equivalent  problem  cast  in  terms  of  random  vari- 
ables.  It  is  the  author's  view  that  it  is  easier  to  assess  the  magnitude 
of  the  variances  of  the  random  variables  than  to  choose  an  appropriate 
set  of  weighting  matrices  directly. 

A  general  model  of  the  underlying  physical  process  which  generates 
the  data  has  been  presented.   For  any  real  physical  situation  the  para- 
meters $,  r,  /?,  tf(l/0)  and  P(l/0)  will  be  numerical  quantites  and  the  non- 
linear functions  h(x)   will  be  a  vector  of  explicit  functions.   This  is  the 
type  of  information  which  a  filter  designer  must  have  before  the  filter 
can  be  constructed.   For  a  given  model  there  are  many  possible  filters 
which  might  be  considered;  in  each  case  presumably  the  output  of  the  fil- 
ter would  be  best  in  some  sense,  often  unspecified.   The  filter  under 
consideration  in  this  work  is  defined  by  the  sense  in  which  the  estimates 
are  best,  i.e.,  the  filter  is  a  least-squared-residuals  filter.   Thus,  it 
should  be  noted  that  the  filtering  problem  has  been  transformed  into  a 
sequence  of  minimization  problems.   It  will  turn  out  that  the  solution  to 
each  minimization  problem  is  itself  a  sequence  of  solutions  to  a  much 
simpler  minimization  problem,  namely  the  problem  with  linear  observation 
functions.   The  solution  to  this  simpler  problem  is  well  known  and  is 
discussed  in  the  next  section  along  with  other  filter  techniques  where 
the  model  is  similar  to  the  one  described  in  this  section. 
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3.   Prior  Work. 

The  field  of  state  estimation  has  received  a  great  deal  of  interest 
recently,  especially  since  the  work  of  R.  E.  Kalman  f7]  and  R.  E.  Kalman 
and  R.  Bucy  [8]  in  linear  estimation  theory.   For  a  comprehensive  survey 
of  the  general  field  of  estimation  Deutsch  f*6]  is  suggested.   Lee  [9] 
presents  a  fine  treatment  of  the  theory,  particularly  with  respect  to  the 
relationship  between  control  and  estimation  theory.   Cox  [2,    3]  is  a 
specialized  review  of  the  efforts  in  the  area  of  nonlinear  estimation  of 
which  this  work  is  a  special  case. 

The  structure  of  the  problem  and  the  results  obtained  by  Kalman  [7] 
along  with  several  extensions,  modifications  and  alternate  interpretations 
are  discussed  in  some  detail.   This  discussion  provides  a  convenient  refer- 
ence for  comparison  of  the  results  of  this  thesis  as  well  as  an  opportunity 
to  establish  certain  known  results  which  will  be  needed  in  the  development 
of  the  method  that  follows.   The  method  of  trajectory  linearization  is  dis- 
cussed and  it  is  shown  how  a  natural  extension  leads  to  the  method  used 
here. 

Kalman  Filter 

Kalman  [7]  has  solved  a  special  case  of  the  problem  under  consider- 
ation where  the  measurements  are  linear  functions  of  the  state. 
Symbolically 

*(k)  -  H  *(k)  +  u(k)  (16) 

replaces  (5) . 

Two  cases  are  considered.   In  the  first,  all  of  the  random  sequences 
are  assumed  to  be  sequences  of  Gaussian  random  variables.   With  this  as- 
sumption, the  estimate  is  shown  to  be  optimum  in  the  sense  that  any  linear 
function  of  the  estimate  is  the  minimum  variance  estimate  of  the  same 
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linear  function  of  the  true  state.  The  estimate  turns  out  to  be  a  linear 
function  of  the  observations. 

In  the  second  case,  there  are  no  assumptions  about  the  form  of  the 
probability  density  functions  of  the  random  variables,  but  the  estimate 
is  assumed  to  be  a  linear  function  of  the  observations.   The  optimum 
estimate  is  defined  in  the  same  way. 

In  either  case  the  method  of  computing  the  optimum  estimate  (the 
filter)  is  the  same.  The  sequence  of  operations  can  be  envisioned  as 
consisting  of  two  steps.   The  first  will  be  called  the  prediction  equation 

*(k+l/k)  .  4*(k/k)  (17) 

The  double  argument  notation  will  always  indicate  an  estimate.   The  left 
side  should  be  read;  the  estimate  of  the  state  at  time  k+1  given  data  up 
to  time  k.   The  second  step  will  be  called  the  adjustment  for  newly  re- 
ceived data. 

je(k+l/k+l)  =  *(k+l/k)  +  <?(k)  O(kfl)  -  H  jc(k+l/k)]  (18) 

where  (>(k+l)  -  H  jc(kfl/k)]  is  the  error  in  the  predicted  observations, 
and£(k)  is  a  matrix  of  adjustment  coefficients.   The  matrix  G  (k)   reflects 
the  relative  confidence  one  should  have  in  the  observed  data  as  compared 
to  the  predicted  estimate.   This  is  discussed  in  [ 9~\ . 

G(k)  =  P(k+  1/k)  ffT  \H  P(k+  1/k)  HT  +  7?]"1  (19) 

Where  P(k+l/k)  is  the  covariance  of  the  estimates  defined  as  follows: 

P(k+l/k)  =  ErC*(k+l/k)  -  x(k+l))Cx(k+l/k)  -  jc(kf  l)1)]    .      (20) 
This  formulation  then  requires  that  one  must  keep  track  of  the  co- 
variance  of  the  estimates.   This  is  also  done  in  two  steps. 

P(k+  1/k)  =  $  ?(k/k)  $T  +  r  I*  (21) 

P(k+l/k+ 1)  = 
P(k+l/k)  -  P(kf  1/k)  HT  [H  P(k+l/k)  HT  +  -ft]"1  H  P(k+l/k)  (22) 
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Rauch  [13]  has  derived  these  same  equations  based  on  the  Gaussian 
assumption  and  shown  the  resulting  estimate  is  the  conditional  mean  and 
the  maximum  likelihood  estimate. 

Lee  [9]  has  shown  that  the  same  equations  yield  a  weighted  least 
squared  residual  estimate  where  the  weightings  are  the  inverses  of  the 
covariance  matrices.   This  also  follows  from  the  derivation  of  the  maxi- 
mum likelihood  estimate  based  upon  the  assumption  that  the  random  se- 
quences are  Gaussian. 

Trajectory  Linearization 
The  trajectory  linearization  method  has  been  used  to  solve  nonlinear 
filtering  problems  such  as  orbit  determination  for  artificial  satellites 
[10],  [11].   It  is  assumed  from  physical  considerations  that  the  evolution 
of  the  system  state  satisfies  a  general  difference  equation  of  the  form 

x°(k+l)  =  fCr(k),  uo(k),  k)  (23) 

and  that  observations  are  available  in  the  form 

*(k)  =  hU(k),  u(k),  k)  (24) 

where  oi(k)  and  v(k)   are  random  vectors. 

It  is  further  assumed  that  there  is  a  known  nominal  trajectory  which 
is  a  sequence  of  states  x   (k)  such  that 

*°(k+l)  =  f0c°(k),  0,  k)   .  ^25) 

It  is  desired  to  find  the  best  estimate  of  the  deviation  of  the  true 
state  from  the  nominal  state. 
Defining 

^(k+1)  m   x(k+l)  -  *°(k+l)  (26) 

and  substituting  (23)  and  (25)  into  (26)  results  in 

j/(k+l)  =  fCr(k),  u)(k),  k)  -  fC*°(k),  0,  k)   .  (27) 

This  relation  is  now  approximated  by  a  first  order  Taylor  series  about 
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the  point  jc(k)  =  x°(k)  and  u)(k)  -  0.  The  two  partial  derivatives  are 
given  appropriate  symbols. 

4  (k)  ■  f  0c°(k),  0,  k)  (28) 

X 

r  (k)  ■  f^OO,  0,  k)  (29) 

The  first  order  Taylor  series  expansion  results  in 

j/(k+l)  =  f0c°(k),  0,  k)  +  $(k)  [,y(k)  -  x°(k)] 

+  T(k)  m(k)  -  fU°(k),  0,  k)    .  (30) 

Substitution  of  (26)  in  (30)  results  in 

j/(k+l)  «  <|(k)  i/(k)  +  T(k)  u>(k)    .  (31) 

From  the  nominal  trajectory  it  is  possible  to  construct  a  nominal 
set  of  observations 

*°(k)  =  h0c°(k),  0,  k)    .  (32) 

Consider  the  deviations  of  the  observations  about  the  nominal  obser- 
vations. 

U(k)  2*(k)  -  *°(k)  (33) 

Substituting  (24)  and  (32)  in  (33)  yields 

u(k)  =  hCr(k),  y(k),  k)  -  hCr°(k),  0,  k)  (34) 

The  relation  is  approximated  by  a  first  order  Taylor  series  about 
the  point  .r(k)  =  x   (k)  and  u(k)  =  0.  The  two  partial  derivatives  are 
given  appropriate  symbols 

#(k)  s  hf(r°(k)>  0:  k)  (35) 

S(k)  e  huCf°(k),  0,  k)  (36) 

u(k)~  hCr°(k),  0,  k)  +  ¥(k)  fje(k)  -  *°(k)] 

+  S(k)  y(k)  -  h(r°(k),  0,  k)  (37) 

Substitution  of  (2b)  in  (37)  yields 

u(k)  *  ff(k)  y(k)  +  S(k)  u(k)  (38) 
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Although  it  was  not  noted  in  the  description  of  the  Kalraan  filter, 
it  is  true  that  the  equations  remain  valid  if  any  or  all  of  the  matrices 
*,  H>   r,  B,   are  known  functions  of  time.   These  filter  equations  are  thus 
directly  applicable  to  (31)  and  (38). 

The  purpose  of  the  process  of  trajectory  linearization  is  to  generate 
(31)  and  (38).   It  is  then  noted  that  with  respect  to  the  states  j/(k)  and 
the  observations  u(k)  the  model  is  in  the  form  of  a  linear  dynamic  system 
and  linear  observations.   The  Kalman  filter  is  then  applied  directl>  as 
though  (31)  and  (38)  were  equalities. 

Nonlinear  Noise 

A  question  naturally  arises  concerning  the  adequacy  of  the  first  ord- 
er approximation   in  developing  (31)  and  (38).   The  heart  of  this  problem 
is  investigated  by  Denham  and  Pines  T5]  through  the  use  of  a  very  simpli- 
fied model  and  a  number  of  Monte  Carlo  studies.   They  reach  the  conclusion 
that  the  difficulties  are  of  an  indirect  nature.   The  first  estimates  are 
about  as  good  as  might  be  expected.   In  processing  subsequent  data,  how- 
ever, trouble  develops  because  the  assumed  quality  of  the  first  estimate 
is  too  great,  which  means  that  the  next  data  get  weighted  too  lightly. 

This  effect  can  be  best  seen  by  reconsidering  (31).   In  order  to  make 
this  expression  into  an  equality,  all  of  the  higher  order  terms  must  ">e 
added  to  the  right  side  of  the  expression.   These  additional  terms  should 
be  considered  as  part  of  the  observation  noise.   It  is  the  failure  to 
account  for  this  nonlinear  noise  in  the  Kalman  filter  that  causes  the  dis- 
crepancy between  the  covariance  of  the  estimates  as  computed  in  the  filter 
and  the  true  average  squared  estimation  errors.   When  the  calculated  co- 
variance  overstates  the  quality  of  a  given  estimate,  a  subsequent  obser- 
vation will  certainly  be  combined  with  the  current  estimate  in  a  non- 
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opt  imum  manner . 

Denham  and  Pines  point  out  that  when  the  order  of  magnitude  of  the 
expected  value  of  the  neglected  terms  is  of  the  order  of  the  natural 
measurement  noise  one  cannot  expect  the  linearized  filter  to  work  proper- 
ly.  The  expression  for  the  "nonlinear  noise"  involves  the  difference 
between  the  true  state  and  the  point  about  which  the  linearization  takes 
place.   If  there  were  some  way  to  reduce  this  difference  then  the  nonlinear 
noise  would  be  reduced  correspondingly.   These  authors  attribute  to  John 
Breakwell  an  iterative  procedure  for  accomplishing  this.   The  procedure 
is  to  linearize  and  filter,  then  relinearize  at  the  new  estimate  and  fil- 
ter again.   This  cycle  is  repeated  until  the  output  of  the  filter  is  the 
same  as  the  point  at  which  the  linearization  takes  place.   A  Monte  Carlo 
study  using  this  iterative  technique  revealed  that  the  computed  covariances 
quite  accurately  reflected  the  quality  of  the  estimates. 

It  will  be  noted  that  the  method  used  to  solve  the  least-squared -re- 
sidual problem  as  developed  in  the  next  section  is  exactly  the  iterative 
method  suggested  by  Breakwell. 
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4.    Development  of  the  Solution  Algorithm. 

The  development  will  proceed  in  two  phases,  the  first  being  a  con- 
ceptual means  of  finding  the  absolute  best  estimate,  the  second  being  the 
development  of  a  series  of  compromises  required  for  computational  feasi- 
bility. 

The  minimization  of  (12)  will  have  to  be  carried  out  in  an  iterative 
fashion  since  the  simple  process  of  differentiating  and  setting  to  zero 
does  not  lead  to  an  explicit  formula  for  the  state  estimates  as  it  would 
if  the  observation  functions  were  linear.   The  iterative  procedure  is 
based  upon  a  linearization  of  the  observation  functions.   The  linearized 
observations  are  then  in  the  form  (16)  and  minimization  is  carried  out 
using  the  Kalman  filter  equations.   This  produces  a  set  of  state  estimates 
about  which  the  nonlinear  functions  can  be  relinearized.   This  process  is 
repeated  until  there  is  no  further  change  in  the  state  estimates. 

The  Kalman  filter  in  its  normal  form  is  not  completely  adequate  since 
its  output  is  the  sequence  of  estimates  je(l/l)  through  jp(k/k)  while  the 
point  about  which  it  is  desired  to  linearize  is  ,y(l/k)  through  ,#(k/k)  . 
These  latter  estimates  are  called  the  smoothed  estimates.   There  are  for- 
mulas available, due  to  Rauch  [13], for  converting  the  output  of  the  Kalman 
filter  into  smoothed  estimates.   There  is,  however,  a  more  convenient  way 
to  get  the  same  results  in  this  case  where  all  of  the  smoothed  estimates 
are  required.   This  involves  converting  from  a  multistage  problem  to  a 
single-stage  problem  with  a  proportionately  enlarged  state  space.   The  de- 
tails of  this  conversion  will  be  discussed  following  the  discussion  of  the 
iteration  for  the  single-stage  process. 
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5.   The  Single  Stage  Minimization  Procedure. 

The  equations  related  to  the  single-stage  process  are  rewritten  here 
using  a  simplified  notation  and  are  numbered  using  the  same  numbers  as  in 
their  first  appearance  with  an  '  added. 

z   a  h(x)  +  V  (5') 

E[u]  .  0  (6') 

E[UUT]  =  B  (7') 

Eru*T]  -  0  (8') 

ErC*-*0)Cx-*0)T]  =  PQ  (10') 

Erjc-jfo]  =  0  (ii') 

CO^)  =  \\xx   -  xj\2p   -1  +  Ik  -  hCr^U  V"l  (12,) 

o 

where 

2  is  a  vector  of  observations, 

y  is  the  noise  in  these  observations, 

B  is  covariance  of  the  noise, 

X  is  the  a  priori  estimate  of  the  true  state  x> 

X,  is  the  new  estimate  of  x,  and 

P     is  the  covariance  of  the  a  priori  estimate. 

o  r 

The  pertinent  equations  from  the  solution  to  the  linear  problem  are 
also  rewritten  here  . 

z  =  H  x  +  v  (16') 

XX  =  xQ  +  G(*-HxQ)  (18') 

g  =  pjprH  poht  + ;?]" 1  (19') 

The  non-singularity  of  P     assumed  in  (12')  is  fully  discussed  in 
Appendix  I.   It  is  sufficient  here  to  say  that  if  p     is  singular  the 
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problem  can  be  reduced  to  a  smaller  state  space  where  the  associated  P 
is  non-singular.   It  will  always  be  assumed  that  B   is  non-singular,  i.e., 
the  assumption  is  made  that  there  are  no  observations  of  unlimited  ac- 
curacy. 

The  iterative  minimization  procedure  involves  a  linear  (first-order) 
approximation  to  (5')  about  a  point  x,   which  will  always  be  the  best 
available  estimate  of  x*     This  linearized  approximation  is  then  manipulat- 
ed to  form  a  synthetic  observation  which  has  the  form  (16').   This  syn- 
thetic observation  is  used  in  (18')  and  a  new  approximation  to  the  best 
estimate  is  obtained.   Using  this  estimate  as  the  point  of  linearization 
of  (5*)  the  process  is  repeated.   These  steps  are  expressed  symbolically 
as  follows. 

Z   =  h(x[)   +  h^ (x[)   [*-**]  +  v  (39) 

where  jf,    is   the   ith  approximation  to  the  best   estimate  x  . 

ZL=    Z   -  hCxJ)  +  tf1  x\  (40) 

zi  =  Hix  +  v  (41) 

where  z      is  the  so-called  synthetic  observation  and 

ff1-  \<*\)  (42) 

x\+l   =  X0^G%i-Hi  xQ)  (43) 

where 

Gl  =  PQ  HiT   [fflPotfiT  +  tf]"1  (44) 

This  completes  the  description  of  the  pure  minimization  algorithm 
except  for  a  specification  of  the  initial  point  of  linearization  and  the 
possibility  of  overshoot.   Discussion  on  the  initial  point  of  linearization 
will  follow  after  the  discussion  of  conversion  from  multi-stage  to  single 
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stage.   The  possibility  of  an  overshoot  results  from  the  fact  that  a 
simple  first  order  approximation  to  the  nonlinearity  may  not  be  accurate 
for  the  subsequent  change  in  x , •   The  overshoot  protection  scheme  used 
is  simply  if 

C(jfJ+1)  *  C(x[) 

then  x.        is  replaced  by  1/2  (x.        +  x, )* 

It  will  now  be  shown  that  each  step  in  this  process  does  result  in 
a  decrease  in  the  cost,  C.   Since  if  the  new  cost  is  greater  than  the  old 
cost  the  step  size,  #    -  jf   ,  is  reduced  by  a  factor  of  2  it  only  is 
necessary  to  show  that  for  a  small  enough  step  size  there  will  be  a  re- 
duction in  C.   The  demonstration  will  be  begun  by  showing  that  the  change 
in  estimate  is  related  to  the  negative  gradient  of  the  cost  evaluated  at 
X-,    by  a  positive  definite  matrix. 


g(x[)   =  -k  Cx     (x[)  (45) 


gOffJ)  =  P"0l    <X0-x[)  +  HiT  fi~l[2-h(x\)]  (46) 
The  change   in   the   iterate   is 

d  ■  jc1       -  xl  (47) 

d  =  xQ  -  x[  +  tfWj^]  (48) 

d  a  xQ  -  x[  +  G\z-h(x[)  +  ff^J-ff*,,]  (49) 

d  =  Tl-(?V]0eo-^J)  +  C^-hCrJ)]  (50) 
Now  to  show  that 

d  =  rVPo#1T  ^Vy1  +  BT1  ^Po]  gOc*)  (51) 
it  must  be  shown  that 
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[i  -  o V]  =  cp0-p/t  c#  ViT  +  ;?]"1  ^  V1  (52) 

and  that 

C?1  =  [Po-PJiiT  ifojF  +  *]"1  fflJ>0]  ff"*"1     •  (53) 

The  first  of  these  is  obvious  after  substituting  for  G    . 

iT 
In  the  latter  P  R 'is  factored  from  the  left  side  to  obtain 
o 


PoHiT  [I  -  {HiP0HiT  +  JlY1  FV/iT]  J?"  . 
and  then     [tf1?  #lT  +  P]"1  \HLPHt'L  +  *]      is  substituted  for  the  I  above. 

p  #iT  r^p  #iT  +  pi"1  iVp  #iT  +  P  -  fV/11]  p"1 

o        o  o  o 

After  cancellation,  Che  above  is  the  expression  for  G    . 

Note  the  matrix  PQ-p  #lT  [H±PJiXT  +  P]"1  PLP0  is  the  covariance  of 
the  updated  estimate  in  the  linear  case  so  it  will  be  given  the  special 
symbol 

Pl   ~  Po  "  P/1T  ^o^  +  P]"1  ^lpo  (54) 

so  that  (51)  can  be  written 

d  =  Pl8C*i)  (55) 

After  applying  the  overshoot  control  the  actual  change  of  the  iterate 
is  in  the  direction  of  d  but  may  have  a  smaller  magnitude.   Let  D  be  the 
actual  change  so  that 

D  =  qd  (56) 

D  =  qPlgC*J)  (57) 

where  q  is  some  integral  power  of  (1/2). 

Then  to  a  first- order  approximation  the  change  in  C,  AC,  is  given  by 

AC.  2gTD  (58) 

AC=  2q  gTd  (59) 
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AC?  2q  gTPlg  (60) 

AC=2q||g||2  (61) 

rl 

Since  all  higher  order  terms  in  the  expansion  of  AC  involve  higher  powers 

of  q  it  can  be  asserted  that  for  some  small  enough  q  the  first  order  term 

will  dominate.   Thus  for  some  suitable  q  the  change  in  C  will  be  negative 

for  any  non-zero  g  if  P     is  a  positive  definite  matrix.  P.    is  known  from 

the  linear  theory  to  be  positive  definite  for  any  value  of  H     and  any 

positive  definite  .ft.   This  also  follows  from  the  fact  that 

-1    -1    iT-ll 
Pxl  =  PQl   +  H   V?  V  (62) 

a  convenient  matrix  identity  discussed  in  r 1] ,  and  the  fact  that  the  in- 
verse of  a  positive  definite  matrix  is  positive  definite. 

It  has  been  shown  that  the  special  single- stage  problem  can  be  solved 
by  solving  a  sequence  of  simple  problems  which  approximate  more  and  more 
closely  the  real  problem.   Each  iteration  was  shown  to  reduce  the  cost 
unless  the  gradient  of  the  cost  was  zero. 

In  the  next  section  it  will  be  shown  that  the  general  problem  con- 
sidered in  this  method  can  be  recast  in  the  form  of  a  single-stage  prob- 
lem. 
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6.   Multi-stage  Case  Cast  in  Single  stage  Form. 

The  process  of  converting  a  multi-stage  estimation  problem  into  a 
single  stage  problem  is  accomplished  by  defining  the  new  state  to  be  the 
juxtaposition  of  the  states  at  the  various  stages.   The  process  will  be 
carried  out  in  detail  for  a  two- stage  problem  and  then  the  process  will 
be  generalized  by  induction  for  a  k-stage  process. 

Let  the  new  state  be 


X  = 


*(2) 


(63) 


let  the  new  estimate  be 


1   [*(2/2)| 


(64) 


let  the  a  priori  for  the  new  state  be 

and  let  the  a  priori  covariance  of  the  new  state  be 


(65) 


Po* 


P(l,l/0)   P(l,2/0) 


pl(l,2/0)\   P(2,2/0) 


166) 


Some  of  the  elements  in  x     and  F     have  not  been  previously  defined. 

However,  the  notation  used  has  already  been  defined,  i.e.,  #(2/0)  means 

the  estimate  of  the  second  state  given  no  data.   The  submatrices  in  P 

o 

are  defined  as  follows: 


P(i,j/0)  h  E[C*(i/0)  -  *(i»(*(j/0)  -*(j))T] 


(67) 


This  is  a  natural  extension  of  the  double  argument  notation  alreay  defined 
which  is  necessary  to  accomodate  consideration  of  the  cross  correlation 
between  states  at  various  times.   Since  these  new  elements  are  not  given 
directly  in  the  multi-stage  model  it  will  be  necessary  to  fill  in  these 
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elements  in  order  to  define  the  single-stage  model. 

First  consider  x(.2/0) .   Based  upon  the  requirement  that  (11)  be  sat- 
isfied it  must  be  true  that 

jc(2/0)  =  E[*(2)]   .  (68) 

Substituting  for  x(2)    from  (1)  and  taking  advantage  of  (2),  the  fact  that 
U)(l)  has  mean  zero,  yields 

*(2/0)  =  $>E[>(1)]   .  (69) 

Introducing  (11)  above  yields 

*(2/0)  =  <fc*(l/0)    .  (70) 

If  there  are  deterministic  inputs  (or  equivalently  Ef"u)(k)]  ^  0)  the 
appropriate  modification  is 

JC(2/0)  =:  £#(1/0)  +  T  El"u>(D]   .  (71) 

Now  consider  the  various  submatrices  of  P   .  p(l,l/0)  is  already 
known  in  the  multi-stage  problem  as  ?(l/0).  p(l,2/0)  is  given  by 

P(l,2/0)  =  ErCed/0)  -*(l))(*(2/0)  -jc(2))T]   .  (72) 

Substituting  (70)  and  (1)  in  the  last  part  of  (72)  yields 

P(l,2/0)  =  E[Cr(l/0)  -  *(1))(*jc  (1/0)  -  $#(1)  -  ru)(l))T]   .     (73) 

T 
Taking  advantage  of  the  independence  of  uu(l)  from  (2)  and  factoring  * 


yields 


P(l,2/0)  =  E[U(l/0)  -  *(l))C*(l/0)  -  je(l))T]*T  (74) 


but   this   is   just 


P(l,2/0)  =  P(1/0)<J>T      .  (75) 


By  similar  arguments  it  can  be  shown  that 

P(2,l/0)  =  *P(l/0)  (76) 

and 

P(2,2/0)  =  4>K1/0)4T  +  rrT   .  (78) 

The  remaining  elements  needed  to  complete  the  description  of  the  single- 

33 


stage  model  have  to  do  with  the  observation  process, 
Let  the  actual  observations  be 


(79) 


let  the  vector  of  observation  functions  be 

h(jf)  = 


[h(*(l))I 
[hC*<2))| 


(80) 


let  the  measurement  noise  be 


(81) 


and  let  the  measurement  noise  covariance  be 

"jf?  !    6 


B   = 


o .  n 


(82) 


where  the  off-diagonal  elements  of  R   are  zero  matrices  in  view  of  the 
whiteness  of  the  observation  noise. 

As  the  number  of  stages  is  increased,  the  dimension  of  the  single- 
stage  x   is  also  increased.   The  process  of  expanding  the  single-stage 
model  of  a  k  stage  model  to  that  of  a  (k  +  1)  -stage  model  is 
explained  in  detail  only  for  x,  x    ,  and  P    .   The  expansion  process  or  aug- 
mentation for  the  remaining  elements  of  the  model  is  as  simple  as  the  aug- 
mentation of  x   will  prove  to  be.   The  augmentation  of  x   is  shown  explicitly 
as  an  example. 

In  the  augmentation  process  x   goes  from 


r*an 


X   = 


*0O 


to 


*(1) 


*(k) 
*(k  +  1) 
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The  expansion  of  x     is  only  slightly  more  involved.   For  the  (k  +  1)- 
stage  case  the  a  priori  estimate  is 

*(l/0) 


*o  = 


*(k/0) 
*(k  +  1/0) 


(83) 


where  #(k  +  1/0)  =  <f«f(k/0) . 

The  structure  of  the  P     corresponding  to  the  (k  +  l)-stage  case  is  a 
(k  +  1)  by  (k  +  1)  square  matrix  whose  elements  are  submatrices.   The 
upper  left  k  by  k  part  of  this  matrix  is  already  known  from  the  p     assoc- 
iated with  the  k-stage  model.   Thus  to  expand  to  the  k  +  1  case  it  is  only 
necessary  to  fill  in  the  lower  border.  The  a  priori  covariance  P     has  the 


form 


^o  = 


P(l,l/0)      .   .   .  ?(l,k/0) 


P(k,l/0)     .   .   .  P(k,k/0) 


P(l,k  +  1/0) 


P(k,k  +  1/0) 


(84) 


P(k  +  l,i/0)  =  tP  (k,i/0)   for  1  £  i  £  k 
,T. 


P(k  ■»■  1,1/0)  .   .   .  P(k  +  l,k/0)   '  ?(k+  1,  k+  1/0 

The  formulas  below  for  the  new  border  elements  of  p     were  derived  in  ex- 

o 

actly  the  same  way  the  submatrices  P(l ,2/0) ,  p(2,l/0)  and  P(2,2/0)  were 
found  in  the  case  for  k  +  1  =  2. 

(85) 
P(i,k  +  1/0)  =  P*(k  +  l,i/0)   for   1  £  i  £  k  (86) 

P(k  +  l,k  +  1/0)  =  $P  (k,k/0)  $T+rrr  (87) 

Thus  it  is  clear  that  the  dynamic  relations  represented  by  (1)  for  the 
multi-stage  problem  are  incorporated  into  the  very  special  structure  of 
the  large  covariance  matrix  P     in  the  single-stage  equivalent. 

After  a  given  multi-stage  problem  has  been  cast  in  single-stage  form, 
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the  single-stage  problem  is  solved  using  the  methods  previously  described. 
The  outcome  of  the  single-stage  solution  is  the  best  estimate,  x+,  which 
must  then  be  interpreted  in  terms  of  the  original  multi-stage  problem. 
The  best  estimate,  x-,,    is  the  best  estimate  of  all  states  given  all  data 
up  to  the  present  stage,  i.e. 

*<l/k)| 


*1  = 


(88) 


*(k/k) 

Finally,  as  each  new  single-stage  solution  process  is  started  there 
must  be  an  inital  estimate  of  x-,    which  is  called  x-,    •   The  first  iterate 
for  a  (k  +  l)-stage  minimization  will  be  based  on  the  final  estimate  from 
the  previous  minimization  over  k  stages.   Explicitly,  the  first  iterate  is 
given  by 

jcd/k) 


*1- 


JC(k/k) 
*(k  +  1/k) 


(89) 


i 


where  *(k  +  1/k)  -  $r(k/k)  (90) 

At  this  stage  in  the  development  of  the  filter,  the  problem  has  been 
solved  but  in  a  very  impractical  way.   The  filter  has  been  broken  into  an 
unlimited  sequence  of  minimization  problems.   Each  minimization  problem 
is  solved  through  an  as  yet  unlimited  sequence  of  approximate  solutions. 
In  order  to  design  a  practical  filter  a  realistic  convergence  test  must 
be  used  to  terminate  the  minimization  process.   Such  a  convergence  test 
is  discussed  in  the  next  section.  Another  difficulty  arises  in  connection 
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with  the  sequence  of  minimization  problems.   As  more  and  more  data  are 
considered,  spanning  a  larger  and  larger  collection  of  states,  the  size 
of  the  minimization  problem  increases   A  practical  means  of  limiting 
the  size  (dimensionality)  of  the  minimization  problem  will  be  discussed 
in  a  subsequent  section. 
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7.   Criteria  for  Termination  of  the  Minimization  Procedure. 

It  seems  to  be  a  universal  feature  of  any  iterative  numerical  method 
that  the  termination  decision  is  based  upon  "feel"  or  "rules  of  thumb". 
Typically,  a  quantity  is  chosen  as  a  measure  of  the  convergence  of  the 
iterative  process  and  this  measure  is  compared  against  a  standard  or 
threshold.  Fortunately,  in  this  particular  case  it  is  possible  to  offer 
some  insight  into  the  choice  of  both  the  measure  and  the  standard  or 
threshold.  This  is  true  because  the  problem  is  basically  a  stochastic 
one.  By  analogy  with  certain  special  cases  it  is  possible  to  approximate 
the  probability  density  function  of  the  measure  of  interest. 

The  control  law  or  algorithm  for  the  control  of  the  number  of  itera- 
tions is  based  upon  the  fact  that  the  minimum  cost,  GO?.)*  is  itself  a 
random  variable  whose  distribution  function  caw.  be  approximated  by  that 
of  a  chi  square  random  variable.   The  minimum  cost  is  exactly  distributed 
as  a  chi  square  variable  when  the  measurements  are  linearly  related  to 
the  states  and  all  of  the  random  variables  are  Gaussian.  The  chi  square 
distribution  is  characterized  by  a  parameter  called  the  number  of  degrees 
of  freedom.  For  the  least  square  problem  the  number  of  degrees  of  free- 
dom is  the  number  of  constraint  equations  less  the  number  of  parameters 
adjusted  in  the  process  of  minimizing  the  sum  of  the  sqwares . 

Using  this  assumed  distribution  function  for  the  minimum  value  of  C 
it  is  possible  to  evaluate  numerically  the  probability  of  a  minimim  C  being 
greater  than  some  number  C  •   Actually  the  question  is  reversed  so  that  a 
C.  is  found  such  that  the  probability  that  a  minimum  C  is  greater  than  C. 
is  some  small  number  ot .     This  number  C  (<*)  is  then  used  as  a  threshold  for 
comparison  with  the  actual  value  of  G  after  «ach  iteration.   If  C  is 
greater  than  C.  the  process  is  reiterated  on  the  assumption  that  it  is 
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very  improbable  that  this  value  of  C  is  in  fact  a  minimum. 

Clearly  using  such  a  test  will  reject  a  certain  number  of  true  mini- 
mum calues  of  C.   For  this  small  percentage  (<*)  of  cases,  the  stopping 
criteria  is  based  upon  the  relative  change  in  state  estimates.   When  the 
relative  change  in  each  component  of  the  state  vector  after  an  iteration 
is  less  than  some  small  number,  EPS,  the  successive  estimates  are  con- 
sidered to  be  equal  and  the  process  is  terminated. 

On  the  other  hand,  passing  this  first  test  does  not  assure  that  a 
minimum  has  been  reached.   For  this  reason,  a  second  test  is  prescribed 
which  specifies  a  minimum  improvement.   Choosing  the  minimum  improvement, 
C..,  may  be  considered  purely  arbitrary.   On  the  other  hand  it  may  be 
helpful  to  invoke  a  statistical  interpretation  to  aid  in  choosing  Cw. 
Such  an  interpretation  exists  if  all  of  the  random  sequences  are  Gaussian. 
Then  C(x   )  is  related  to  the  likelihood  of  x     and  C(.r  )~C(x       )  is  related 
to  the  likelihood  ratio.   The  likelihood  ratio  is  a  common  statistic  used 
to  test  the  significance  of  the  difference  between  two  estimates. 

The  difference  is  considered  to  be  significant  at  the  3  level  if  the 
probability  of  occurrence  of  the  observed  likelihood  ratios  is  less  than 
3  under  the  assumption  that  X     is  the  true  state.   The  test  of  statisti- 
cal significance  is  made  by  setting  a  threshold  on  the  likelihood  ratio, 
or  some  function  of  it.   The  difference,  Q,(x.)-Q,(x.      ),  is  minus  twice  the 
likelihood  ratio  and  has  a  chi  square  distribution  with  the  number  of  de- 
grees of  freedom  equal  to  the  number  of  components  in  the  state,  x.      C   it 
chosen  as  that  value  for  which  the  probability  of  a  chi  square  variable 

less  than  C   is  3 .   The  test  is:   if  C(Jf  )-C(x       )  is  greater  than  C  re- 
L  L 

iterate,  otherwise  terminate  the  procedure.   The  satisfaction  of  the  test 
suggests  the  interpretation  that  the  last  two  estimates  are  not  significantly 
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different  so  no  further  iteration  is  carried  out. 

The  convergence  test  described  may  be  summarized  in  terms  of  three 

quantities  involved  in  the  minimization  process  and  three  thresholds. 

The  three  quantities  are  1)  r,  the  maximum  absolute  relative  change  in 

any  component  of  the  state  vector  from  one  iteration  to  the  next ,2)  C  , 

the  current  value  of  the  cost  and  3)(C   -C  ),  the  change  in  the  cost  over 

the  last  iteration.   The  corresponding  three  threshold  parameters  are 

EPS,  Cw,  C  .   The  decision  rules  for  terminating  or  continuing  the  pro- 
N    L 

cess  are  displayed  in  Figure  1. 


Figure  1.   Flow  graph  of  the  criteria  for  iteration  termination. 
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As  an  example,  consider  a  single  stage  minimization  where  the  state 
has  six  components  and  the  measurement  has  four  components.   The  pro- 
bability that  a  chi  square  variable  having  four  degrees  of  freedom  will 
have  a  value  greater  than  14.9  is  0.005.   This  suggests  that  if  C  *  14.9 

Id 

only  one  true  minimum  out  of  200  will  be  rejected  by  this  test.   The  prob- 
ability that  a  chi  square  variable  with  six  degrees  of  freedom  will  be 

less  than  0.872  is  0.01.   If  C(^i)-C(Jf  "  )-C  =.872  the  last  two  iterations 

M 

are  not  significantly  different  at  a0.99  confidence  level.  Finally, for 
those  unusual  cases  where  the  true  minimum  is  greater  than  14.9  the 
iterations  are  continued  until  the  iteration  values  are  the  same  within 
the  limitations  of  computer  word  length.  For  the  CDC  1604  the  floating 
operations  carry  about  ten  significant  digits.  It  should  be  considered 
that  absolute  convergence  has  been  attained  when  the  relative  change  in 

all  of  the  components  of  the  estimate  do  not  change  more  than  one  part  in 

9  -9 

10  .   This  indicates  a  choice  for  EPS  of  10   . 

The  remarks  of  this  section  were  directed  toward  providing  some  in- 
sight into  the  choice  of  the  threshold  parameters.  While  the  assumptions 
which  would  make  these  interpretations  rigorous  may  in  most  cases  be 
lacking,  the  filter  designer  must  incorporate  a  convergence  test,  i.e.,  he 
must  choose  a  set  of  parameters.   The  interpretations  discussed  above  are 
offered  as  an  aid  toward  choosing  an  efficient  set  of  parameters. 
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8.    Criteria  for  the  Number  of  Smoothing  Stages  to  be  Carried. 

As  a  result  of  the  way  in  which  the  best  estimate  has  been  defined, 
the  complexity  of  the  algorithm  increases  with  the  number  of  stages  over 
which  data  is  available.   The  estimate  of  state  at  the  initial  time  is 
the  result  of  a  minimization  involving  n  (number  of  components  in  the  state 
vector)  variables.   The  estimate  of  the  state  at  the  time  of  the  twelfth 
measurement  would  be  the  result  of  a  minimization  over  12n  variables. 
Clearly,  proceeding  in  this  way  the  computational  requirements  will  ex- 
ceed the  capabilities  of  any  computer  after  a  finite  number  of  stages. 

The  work  of  Denham  and  Pines  [5]  has  focused  attention  on  the  dif- 
ference between  the  case  where  the  measurements  are  linear  in  the  states 
and  the  more  general  nonlinear  case.   This  difference  was  shown  to  be  the 
result  of  neglecting  higher-order  terms  in  the  expansion  of  the  measure- 
ment function.   The  minimization  over  many  stages  may  be  viewed  as  a  means 
of  avoiding  this  problem  since  each  linearization  is  only  tentative.   As 
new  data  become  available,  providing  more  information  about  the  old  states, 
the  linearization  of  the  measurement  functions  becomes  more  accurate. 
When  the  state  is  known  well  enough  so  that  the  second-order  terms  are 
negligible,  the  linearized  measurement  is  considered  to  be  accurate.   This 
approach  will  be  developed  into  a  criterion  for  the  number  of  smoothing 
stages  which  must  be  carried  in  the  next  minimization  process. 

Consider  a  second-order  expansion  of  a  single  nonlinear  observation 


function  about  a  point  x    • 


z  -  hCr°)  +  h  Cx°)(jc-x°)  +  h<jc~x°)     h    fce°)  Cr-*°)  +  v  (91) 

•*  XX 

where  h  Of  )  is  a  row  vector  of  partial  derivatives  of  h  evaluated  at 

X 

X     and  h   (x  )  is  the  symmetric  matrix  of  second  partials  of  h  evaluated 
XX 

o 
at  x   • 
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In  order  to  form  a  synthetic  observation,  z    ,   of  the  form  of   (16), 
the  synthetic  observation  must  be  a  linear   function  of  the   true  state 
with  added  noise  which  has   zero  mean. 

Z°  *  z   -  hOf°)  +  h  C*°)  *°  -  b  (92) 

z°=Hx  +  v  +  v%  (93) 


where  H  =   h   Oe  )   and 
X 


b  =   h  Er0f-J«fO)   b<X°)<X'X°)]  (94) 

XX 

and  v'    is  the  variation  of  the  second -order  term  about  its  mean.b. 

The  added  noise  V1    is  considered  to  be  the  random  noise  caused  by 
the  linearizing  process.  The  magnitude  of  this  nonlinear  noise  can  be 
measured  in  terms  of  its  variance,  ft' .      Expressions  for  evaluating  b 
and  J?1  are  developed  in  Appendix  II. 

The  process  of  dropping  a  stage  of  smoothing  is  a  lumping  operation. 
It  can  be  seen  by  examining  the  equations  for  the  linear  filter  that  all 
past  data  is  brought  forward  in  time  through  (17)  and  (21).  This  is  not 
done  immediately  in  the  nonlinear  filter  because  the  observation  equations 
have  been  linearized  at  a  point  which  may  be  quite  different  from  the  true 
state.   By  keeping  several  stages  active  in  the  filter  it  is  possible  to 
perform  the  linearization  at  a  point  much  closer  to  the  true  state.  The 
dropping  of  a  stage  should  be  accompanied  by  a  high  degree  of  confidence 
that  the  last  linearization  was  performed  at  a  point  close  to  the  true 
state.  That  is,  H   is  not  going  to  change  significantly  as  better  esti- 
mates of  the  true  state  become  available.  The  invariance  of  H   is  related 

to  the  expression  f  or  P, '   =  %:race[h  Of  )P]   since  h  (x   )  represents  the 

XX  XX 

variability  of  H  with  x   and  P   represents  the  variance  of  x   •  Thus  when 
the  natural  noise  in  each  element  of  the  observation  vector  for  a  given 
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stage  is  an  order  of  magnitude  greater  than  the  nonlinear  noise,  the 
stage  corresponding  to  observation  no  longer  carried. 

The  mechanics  of  lumping  are  quite  straight  forward  once  it  has 
been  decided  that  the  current  linearization  is  a  good  approximation. 
Under  these  conditions,  the  Kalman  sequential  filter  equations  are  dir- 
ectly applicable.   If  these  equations  are  applied  to  those  stages  which 
are  to  be  lumped,  the  result  will  be  an  a  priori  estimate  of  the  first 
state  which  is  not  to  be  lumped.   The  states  which  have  been  lumped  no 
longer  appear.  All  of  the  information  that  these  states  carried  with 
respect  to  the  estimation  of  future  states  is  characterized  by  the  a 
priori  estimate  of  the  first  state  which  is  not  lumped.   From  this  point 
then,  the  problem  has  exactly  the  same  structure  as  the  problem  before 
lumping  except  that  there  are  fewer  stages  being  carried. 
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9.   Computation  of  the  Solution. 

The  basic  features  of  the  algorithm  are  shown  in  Figure  2.   The 
following  is  a  brief  resume  of  the  quantities  which  must  be  computed  in 
order  to  implement  the  filter.   Consider  the  overshoot  decision.   In  or- 
der to  decide  whether  an  overshoot  has  occurred  it  will  be  necessary  to 
evaluate  the  cost,  C.   For  this  purpose  the  multi-stage  expression  (12) 
is  more  convenient  than  the  single  stage  expression  (121).   To  test  for 
convergence  it  is  necessary  to  have  the  current  cost,  the  previous  cost, 
the  current  estimate,  the  previous  estimate  and  the  two  parameters  C 
and  Cu  which  are  functions  of  the  number  of  smoothing  stages  currently 
being  carried.   The  decision  on  the  number  of  stages  to  be  carried  de- 
pends upon  an  evaluation  of  the  nonlinear  noise  associated  with  each 
observation.   In  order  to  evaluate  this  nonlinear  noise,  the  matrix  of 
second  partial  derivatives  of  the  observation  functions  must  be  computed 
at  the  current  best  estimate  of  the  true  state.   In  addition,  there  must 
be  available  a  covariance  matrix  representing  the  uncertainty  of  this 
estimate. 

Computation  of  the  Cost 

In  general  there  are  many  means  of  computing  a  given  quantity.   It 
turns  out  that  the  expressions  for  some  quantities  are  useful  in  discus- 
sing the  problem  but  are  not  the  most  efficient  in  actual  computation. 
Such  is  the  case  with  the  cost  C.   For  discussion  purposes  the  cost  was 
expressed  in  terms  of  the  single-stage  state  variable.   For  computing 
the  cost  at  an  actual  estimate,  however,  the  form  of  (12)  is  more  effic- 
ient. 

The  first  term  is  handled  as  follows: 

||*(l/0)  -  *<l/k)||*  =  ||B(x<l/0)  -  *(l/k))||2  (95) 
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Figure  2.   Flow  graph  of  overall  procedure. 
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where  W,  =  P  (1/0)  (the  pseudo  inverse  of  P(l/0))  and  B  is  chosen  so  that 
the  equality  holds  which  implies  that  B  satisfies 

BTB  =  P#(l/0)   .  (96) 

A  suitable  B  is  found  by  a  special  routine  which  begins  by  decomposing 
P(l/0)  into  the  form 

P(l/0)  =  AAT  (97) 

where  A  is  an  n  x  r  matrix  and  r  is  the  rank  of  P(l/0).  If  r  =  n  then 

PV/O)  =  P_1(l/0)  and  (98) 

B  =  A"1   .  (99) 


If  r  £  n  then 


P#(l/0)  =  ATV  (100) 


#      #  #     T   -1  T 

which  implies  that  B  =  A  and  A  can  be  computed  by  A  =  (A  A)  A  ,  where 

T 
the  indicated  inverse  is  known  to  exist  by  construction.   That  is,  A  A  is 

r  x  r  and  has  rank  r  so  it  is  non-singular.   The  routine  which  computes  A 

also  computes  A   if  it  exists,  with  only  minor  additional  labor. 

For  most  applications  P(l/0)  will  not  be  singular  even  though  the 
single -stage  covariance  will  be  singular  if  T  has  rank  less  than  the 
system  order.   It  is  for  this  reason  that  the  procedure  adopted  has  a 
built-in  flexibility  to  handle  the  singular  case  but  handles  the  non- 
singular  case  with  virtually  no  loss  of  computational  efficiency  over  the 
more  conventional  approach  of  inverting  directly  the  covariance  matrix 
P(l/0). 

Evaluating  the  typical  second  term  in  (12)  is  accomplished  in  a 
similar  fashion 

|U(i/k)  -  4*<i-l/k)||j|  =  Hs^U/k)  -  S^d-l/k)^  (101) 

where  W9  =  (T  T   )  •   Since  T   and  $  are  assumed  to  be  constant  throughout 
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the  problem  S,  and  S2  can  be  computed  beforehand.  By  comparing  terms 

s^-  (rrV  d°2) 

and  S2  =  S^  .  (1Q3) 

There  is  no  loss  of  generality  in  assuming  that  r  has  full  rank,  i.e., 
rank  equal  to  its  minimum  dimension.  Under  these  conditions 

cr  rT)#  =  rT#r*  (io4> 

where  T#  =  (fy)'V     •  This  implies  that 

sx  =  O^D'V1  (105) 

and  S2  =  (rTT)"1rT*  (106) 

where  the  indicated  inverse  is  known  to  exist. 
Finally  the  third  typical  term  of  (12)  is 

||*<k)-h<r<i/k))||?  (107) 

W3 

where 

w3  =  fl~l 

The  procedure  used  to  evaluate  this  term  assumes  that  the  measurement 
errors  are  independent,  i.e.,  ft   is  diagonal.  While  this  is  a  realistic 
assumption  for  most  real  problems  there  are  means  similar  to  those  al- 
ready used  to  handle  cases  where  the  observation  errors  at  any  time  are 
correlated  with  one  another,  i.e.,  R   is  not  diagonal.  The  details  will 
not  be  discussed  for  lack  of  physical  motivation. 

This  computation  is  carried  out  in  the  computer  subroutine  COST. 

The  auxiliary  matrices  B,  S.    and  S~  are  computed  outside  the  subroutine. 

T 
The  matrix  decomposition  routine  which  generates  A  such  that  AA  ■  P(l/0) 

is  called  DECOMA. 
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The  Basic  Minimization  Procedure 
Another  procedure  which  is  computed  by  a  different  method  than  that 
used  in  the  development  is  the  minimization  process  (43).   The  computa- 
tional procedure  is  a  step-by-step  solution  of  the  linearized  single- 
stage  problem.   This  single-stage  problem  is  viewed  as  involving  a  se- 
quence of  observations.   Each  observation  is  combined  in  turn  to  produce 
a  new  estimate  of  the  state  and  a  new  covariance  of  that  state.   Figure 
3  shows  the  details  of  the  step-by-step  procedure.   At  any  point  in  the 
procedure  the  best  estimate,  given  the  data  processed,  is  x   and  the  assoc- 
iated covariance  is  p.     This  type  of  sequential  processing  is  valid  under 
the  assumption  that  the  observation  errors  are  mutually  independent.   In 
the  computer  program  this  process  is  carried  out  with  the  observations 
divided  into  blocks  according  to  the  time  of  the  observation.   The  sub- 
routine KALF1L  processes  each  block  in  the  manner  indicated  by  Figure  3. 
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Figure  3.   Flow  graph  of  step-by-step  minimization  procedure. 
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This  routine  is  entered  once  for  each  block  or  once  for  each  smoothing 
stage  carried. 

This  process  yields  the  useful  result  that  the  vector  x   after  j 

blocks  is  composed  of  the  sequences  of  state  estimates  x(l/j),  Jf(2/j), 

2 
...,  Jf(k/j).   Similarly  the  matrix  p   is  composed  of  k  submatrices  of  the 

form 

P(l,l/j)  P(l,2/j)    .    .   P(l,k/j) 

Pa   P(2,l/j) 

P(k,l/j)    •    •         •   P(k,k/j) 

It  is  at  this  point  that  a  lumping  operation  takes  place.   If  it  has  been 
decided  that  all  of  the  observations  up  to  and  including  the  j   stage 
have  negligible  nonlinear  noise,  then  a  lumping  operation  is  performed. 
This  consists  of  shifting  all  of  the  estimates  x   up  and  out  and  shifting 
the  p  matrix  up  and  to  the  left.  This  has  the  effect  of  eliminating  any 
reference  to  any  state  at  time  j  or  earlier  and  reducing  the  dimension  of 
the  single-stage  state  x   and  the  single-stage  covariance  p. 
*(l/2) 


X  = 


*(2/2) 
*(3/2) 
*(4/2) 

m  *J 


AFTER 


SHIFTING 


-^->X  - 


*(3/2)1 
*(4/2) 


SHIFTING 
TIME  INDEX 


•>»  X  - 


JC(l/0) 
*(2/0) 


P  = 
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P(l,l/2)     P(l,2/2)  P(l,3/2)  P(l,4/2) 
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P(l,l/0)  P(l,2/0) 
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Figure  4  Evolution  of  the  estimate  and  covariance  during  the  transition 
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The  process  of  shifting  in  the  computer  produces  an  automatic  change  in 
indices.   The  process  is  shown  as  two  step  for  an  example  where  j  s  2, 
k:  4  in  Figure  4.   The  subroutine  SHIFT  performs  the  details  of  shifting 
the  matrices  as  indicated  above  as  well  as  several  other  matrices  which 
must  be  shifted. 
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10.   A  Simple  Example. 

Unfortunately  examples  seem  to  fall  In  two  mutually  exclusive  cate- 
gories, enlightening  or  realistic.   The  following  example  is  introduced 
to  illustrate  the  mechanical  details  of  the  algorithm.   This  example 
also  illustrates  the  process  of  abstracting  the  mathematical  model  from 
the  physical  situation.   Consider  an  active,  drunken,  tight  rope  walker. 
He  is  put  on  a  tight  rope  so  that  only  one  coordinate  will  be  needed  to 
specify  his  position  which  will  be  designated  r(k).  A  drunken  person  is 
considered  in  order  to  introduce  the  concept  that  his  position  is  a  ran- 
dom quantity. 

It  will  be  assumed  from  previous  experience  with  drunken  tight  rope 
walkers  that  his  next  position  is  different  from  his  last  position  by 
some  completely  random  variable.   The  mean  squared  value  of  this  dif- 
ference is  assumed  to  be  known.   Further  it  is  assumed  that  his  ramblings 
to  the  right  are  balanced  in  the  long  run  by  those  to  the  left,  i.e.,  they 
have  zero  mean.   The  mathematical  model  for  this  much  of  the  physical 
situation  is  given  below. 

x(k+l)  -  x(k)  +  ou(k)  (108) 

EO(k)]  =  0  (109) 

t[-<k)-(j)]  -  {§  £  IZ  .  <110> 

For  concreteness  Q  is  taken  as  0.02. 

The  first  expression  is  usually  said  to  be  the  model  of  a  dynamic 
system  excited  by  white  noise.   The  second  and  third  are  quantitative  de- 
scriptions of  the  noise. 

The  next  part  of  the  situation  that  must  be  described  is  the  process 
by  which  data  are  obtained.   Here  the  concept  of  a  nonlinear  measurement 
is  introduced.   An  angular  measurement  is  made  at  some  fixed  distance 
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from  the  tight  rope,  i.e.,  the  observer  turns  his  head  through  a  certain 
angle.   For  simplicity  the  observer  is  placed  opposite  the  center  of  the 
tight  rope  at  a  distance  of  one  unit.   It  will  be  assumed  that  the  ob- 
server can  sense  the  angular  deflection  of  his  head  with  a  standard  de- 
viation of  0.1  radians.   The  mathematical  model  for  the  observer  is  given 
below. 

*(k)  =  tan_1(j*r(k))  +  u(k)  (111) 

EO(k)]  »  0  (112) 

«[»<k)»a)]  -  {*  £  f*  (us) 

and  R   has  been  taken  as  0.01. 

Finally  the  a  priori  data  must  be  specified.   For  this  example  the 
use  of  an  artificial  a  priori  will  be  illustrated.   The  physical  situation 
is  such  that  before  taking  any  data  there  is  essentially  no  information 
available  about  position  of  the  tight  rope  walker.   This  fact  is  model- 
led by  taking  the  a  priori  estimate  as  zero  and  assigning  a  very  large 
variance  to  this  estimate,  say  10,000. 

The  structure  is 

*(l/0)  =  0  (114) 

i>(l/0)  =  10,000  (115) 

This  completes  the  mathematical  model  of  the  physical  process  underlying 

the  observations.   Assume  that  the  first  two  observations  are  0.7854  and 

0.900  radians.   Using  these  observations  the  computations  required  by 

the  filter  are  described  in  detail. 

The  method  described  in  Section  5  for  the  single-stage  case  is 

applied  to  this  example  for  the  first  observation.   The  first  estimate 

of  the  state  is  the  a  priori  estimate  x.    *  0.   The  partial  derivative 

1  12 

of  the  measurement  is  evaluated  to  find  H     ■  l/[l+(^1)  ]  ■  1  for  the 
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first  iteration.  Applying  (44)  yields 

Gl   -  (1)-(10,000)/[(1).(10,000)°(1)  +  (,QI)]  -  1. 
Next  Z     is  computed  from  (40) , 

Z1   -  .7854- tan" X(0)  +  (1)'(0)  -  0.?®54„ 

From  (43) 

2 
X.    (not  an  exponent)  -  0  +  (1)'(7854  -  (1)°(0))  "  0« 

In  Table  I  the  results  of  repeating  this  procedure  thxee  times  are  shown. 

Also  shown  in  the  table  is  the  cost  associated  with  each  estimate, 

including  the  cost  for  the  a  priori.  From  a  table  for  the  chi  square 

variable  the  threshold  values  are  found  to  be  C, (0.05)  °  3„84  and 

L 

C  (0.95)"  0.004.  Both  of  these  are  for  a  single  degree  of  freedom.   C 

M  Li 

has  one  degree  of  freedom  since  there  are  two  constraints,  the  a  priori 
and  the  observation;  less  one  adjustable  quantity,  the  single  component 

of  the  state  estimate.  The  C  also  has  a  single  degree  of  freedom  since 

M 

there  is  only  one  element  in  the  state  vector c  From  Table  I  it  can  be 

2 
seen  that  the  cost  associated  with  x-t   might  be  considered  a  minimum  cost, 

but  there  has  been  a  significant  decrease  in  the  cost  so  the  process  is 
repeated.  Considering  the  last  iteration  it  may  be   said  that  lo0  is  not 
a  significantly  better  estimate  of  the  true  state  than  0.9767.,   This 
terminates  the  first  minimization  process . 

It  is  interesting  to  note  that  the  device  of  taking  a  large  a  priori 
variance  has  led  to  the  expected  result  that  #(1/1)  converges  rapidly 
to  tan  D?(l)]  »  1.0.  It  may  occur  to  the  render  that  this  is  the  hard 
way  to  evaluate  tan  [<?(1)3,  but  the  advantage  of  general  applicability 
of  the  method  outweighs  the  advantages  of  considering  special  cases.   In 
any  case,  the  machinery  for  handling  a  priori  information  mw.st  be  avail- 
able in  order  to  implement  the  lumping  procedure. 
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TABLE   I 
EXAMPLE  OF   SINGLE-STAGE  MINIMIZATION  PROCESS 


A  PRIORI   ESTIMATE  -   0.0 
A  PRIORI  VARIANCE  -   10,000 


OBSERVATION  -    .7854 


OBSERVATION 
ERROR 
VARIANCE 


.01 


ITERA- 

OLD 

NEW 

TION 

ESTI- 

ESTI- 

COST 

NUMBER 

MATE 

MATE 

1 

i 
*1 

>/" 

c1 

,* 

i+1 
*1 

CC*1) 

1 

0.000 

1.000 

1.000 

0.7854 

0.7854 

61.68 

2 

0.7854 

0.6185 

1.617 

0.6042 

0.9767 

1.40 

3 

0.9767 

0.5118 

1.954 

0.5121 

1.0000 

0.015 

4 

1.0000 

.00001 
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Next  the  bias  and  variance  of  the  nonlinear  noise  are  evaluated  to 
determine  whether  a  lumping  operation  is  indicated.  The  nonlinear  noise 
depends  upon  the  variance  of  the  new  estimate  and  the  second  derivative 
of  the  observation  function  evaluated  at  the  estimate.  The  variance  of 
this  estimate  is  computed  from  (54): 

P  -  10,000  -  (10,000)2/(40,000  +  0.01)  -  0,04. 
Using  this  variance  the  lumping  test  criterion  can  be  computed  from 
Appendix  II.   The  bias  due  to  the  second  order  terms  is  0.01  and  the  vari- 
ance of  the  second  order  term  is  0.0001.   Comparing  this  with  the  variance 
of  natural  noise  (observation  errors)  it  is  noted  that  there  is  an  order 
of  magnitude  difference  and  a  lumping  operation  would  normally  take  place. 

For  this  example  the  first  stage  will  not  be  lumped  so  that  the  de- 
tails of  the  two- stage  minimization  process  may  be  illustrated.   If 
lumping  had  taken  place  the  estimate  would  be  projected  forward  to  form 
the  a  priori  estimate  for  the  next  time  frame.   Since  &  =  1  for  this 
simple  dynamic  system  the  new  a  priori  is  just  the  old  best  estimate. 
From  (21)  the  variance  of  the  a  priori  estimate  is  .06.   To  obtain  the 
estimate  of  the  position  of  the  tight  rope  walker  at  the  second  time 
frame  one  proceeds  exactly  as  above  using  the  new  a  priori  information. 

In  order  to  solve  the  two-stage  minimization  problem  the  problem 
is  reduced  to  a  single-stage  problem.   The  elements  of  the  single-stage 
state  are  the  position  at  the  first  time  and  the  position  at  the  second 
time.   The  relations  described  in  Section  6  are  used  to  generate  a  com- 
plete single-stage  problem  having  a  two-dimensional  state. 


x 


X, 


(116) 
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10000    10000 
10000    10000.02 

0.7854 

0.9000 

,01    0 
0     .01 


h(x) 


h1(0P) 


h2  (  x) 


tan  (jr.) 


tan  (r2) 


(117) 


(118) 


(119) 


(120) 


The  above  is  a  new  single-stage  minimization  problem  and  conceptually,  it 
is  solved  in  exactly  the  same  way  that  the  previous  (one  dimensional)  pro- 
blem was  solved,  i.e.,  by  repeated  application  of  (43).   As  a  practical 
matter  it  is  expedient  to  adjust  the  estimate  separately  for  each  ele- 
ment in  Z.     This  is  possible  since  the  errors  in  the  observations  are  in- 
dependent.  This  is  true  in  general  because  the  errors  were  assumed  to  be 
white  in  the  multi-stage  problem. 

Each  iteration  proceeds  in  two  steps.   First  both  elements  of  the 
estimate  are  adjusted  for  the  first  element  of  z   and  then  the  resulting 
adjusted  estimates  are  adjusted  for  the  second  element  of  z.     See  Figure 
3.   The  result  of  the  first  adjustment  is  already  known  and  need  not  be 
computed.   The  first  element  of  this  intermediate  result  is  the  best 
estimate  from  the  previous  minimization  process.   The  second  element  is 
just  the  predicted  estimate  *(2/l)  a  *r  (1/1)  ■  1.0  from  (83).  The  in- 
termediate covariance  is  computed  from  (85),  (86),  and  (87)  and  known 
value  of  P(l/l). 

1 


inter. 


1.0 
1.0 


(121) 
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inter. 


0.04   0.04 


0.04   0.06 


(122) 


Starting  from  this  intermediate  result  the  adjustment  to  the  esti- 

1 
mate  for  the  second  measurement  is  computed.  H    is  now  a  row  vector  of 

partials  of  the  second  observation  function  with  respect  to  both  elements 
of  the  state  vector:  M     -  [_0.   0.50J  .  Note  that  the  zero  in  H     is  a 
general  result  of  the  way  in  which  the  problem  is  formulated.   The  measure- 
ment function  is  formally  a  function  of  the  whole  state  x  although  it  is 
clear  from  the  construction  of  the  single-stage  form  that  each  individual 
element  of  the  measurement  function,  h(v),  is  a  function  of  the  state  of 
the  system  at  only  one  time.   From  (44)  the  adjustment  coefficients,  G   , 
are  obtained. 

^1 


.8 
1.2 


(123) 


Applying  (40)  the  synthetic  observation  is  found  to  be 
2l   =  .9  -  tan"1(l)  +  (.5).  (1)  -  .6146 


and  the  adjusted  estimates  are 


1.0917 
1.1375 


(124) 


The  cost  for  the  intermediate  estimate  (121)  and  the  above  (124) 

2 
estimate,  .*.,  are  1.313  and  .552  respectively.   Since  these  are  a  sum  of 

four  squared  residuals  with  two  adjustable  quantities  the  convergence  — 

test  parameters  are  different.   C  (.05)  -  5.99  and  C  (.05)  -  .103.   Based 

L  N 

on  these  parameters  it  may  be  inferred  that  the  above  estimate  is  signifi- 
cantly better  than  the  intermediate  estimate.   A  second  iteration  will  be 
carried  out. 
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The  minimization  is  accomplished  in  two  steps.  The  first  step  is  to 
adjust  the  a  priori  vector  for  the  first  observation.  The  difference  be- 
tween this  step  and  the  single  stage  minimization  is  that  the  partial 

2 
derivatives  are  evaluated  at  x,   given  above  in  (124).   The  result  of 

this  step  is  comparable  to  the  previous  intermediate  estimate  in  (121) 

and  (122) 


inter 


.9951 
.9951 


(125) 


inter. 


.048   .048 
.048   .068 


(126) 


The  second  step  is  to  adjust  this  intermediate  estimate  for  the  second 


ob 


servation.   The  partial  derivative,  H>    is  to  be  evaluated  at  jr,  and  not 


x   .  .   The  result  of  this  second  adjustment  is 

inter  J 


1.0978 
1 .  1405 


(127) 


The  cost  evaluated  at  this  estimate  is  .539.   The  convergence  tests  in- 
dicates that  the  last  estimate  is  not  significantly  better  than  the 
previous  one.   This  minimization  is  said  to  have  converged. 

After  it  has  been  decided  that  the  process  has  converged  it  is  neces- 
sary to  evaluate  the  second-order  terms  in  the  expansion  of  the  nonlinear 
measurement  function.   For  this  purpose  it  is  necessary  to  have  the  co- 
variance  of  the  last  estimate.   This  covariance  is  automatically  computed 
by  the  method  displayed  in  Figure  3. 

.02096    .02096 


.02096    .02966 


(128) 
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From  Appendix  II  the  expected  value  of  the  second  order  term,  denoted 
as  the  bias  or  simply  b,  is 

.0048 


.0065 
and  the  variance  about  this  expected  value  is 


(129) 


/?'  - 


,000023 


.000042 


(130) 


Assume  that  only  the  first  of  the  measurements  has  negligible  second- 
order  terms.  Then  a  lumping  operation  is  indicated.  This  might  be  ac- 

2 
complished  by  noting  the  two-stage  interpretation  of  X  .  and  P 


x 


inter. 


*(1/D 
*(2/l) 


(131) 


and 


inter. 


P(l,l/1)   P(l,2/1) 
P(2,l/1)   P(2,2/l) 


(132) 


The  lower  element  of  x.  and  the  lower-right  element  of  P.     consti- 

inter  °  inter 

tute  the  a  priori  information  for  the  state  at  the  second  time  frame. 

It  would  be  possible  to  consider  this  a  priori  information  and  the  second 

observation  as  a  new  problem. 

In  the  computer  program  it  is  inconvenient  to  store  all  of  the 
intermediate  results  awaiting  a  decision  on  which  stages  are  to  be  lumped, 
An  alternate  method  for  carrying  out  the  lumping  operation  will  be  de- 
scribed.  Assume  as  above  that  it  has  been  decided  to  lump  the  first 
stage.  The  next  steps  in  the  filter  operation  would  normally  be  as  fol- 
lows. The  third  state  is  predicted  using  (83)  and  the  covariance  matrix 
augmented  accordingly  using  (84).   These  estimates  are  adjusted  for  the 
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third  observation.  The  main  minimization  procedure  is  begun.  Recall 
that  the  main  minimization  procedure  is  carried  out  in  three  steps,  an 
adjustment  for  each  observation.   After  the  first  of  these  steps  the  the 
intermediate  result  can  be  interpreted  as 


x 


inter. 


*(1/D 
*(2/l) 
*(3/l) 


(133) 


and 


inter. 


P(l,l/1)  P(l,2/1)  P(l,3/1) 
P(2,l/1)  P(2,2/l)  P(2,3/l) 
/>(3,1/1)  P(3,2/l)  P(3,3/l) 


(134) 


At  this  point  the  lumping  operation  is  carried  out  by  reducing  the  dimen- 
sion of  the  single  stage  to  two  (see  Figure  2)  and  storing  the  lower  part 

of  x,  (133)  and  the  lower- right  part  of  P,  (134)  in  the  area  as- 

Inter  inter 

signed  to  a  priori  information.  The  remaining  two  steps  of  the  main  mini- 
mization procedure  are  then  carried  out.  If  the  process  has  not  converged 
then  the  next  minimization  will  only  have  two  steps. 

This  completes  the  description  of  the  operation  of  the  filter  for 
this  very  simple  example. 
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11.  Target  Tracking. 

It  was  decided  to  exercise  the  scheme  on  as  realistic  a  problem  as 
could  be  found.  The  target  data  (which  was  fed  into  the  filter)  was  gen- 
erated by  a  sophisticated  simulator.  The  target  motion  is  the  result  of 
maneuver  commands  generated  by  the  user  of  the  simulation  scheme.  The 
simulator  then  computes  the  motion  of  the  target,  and  simulates  the  radar 
returns  which  that  target  would  generate.  The  simulated  radar  is  of  the 
search  type  having  as  available  outputs  range,  range  rate  and  three  dir- 
ection cosines  at  a  rate  of  one  frame  every  two  seconds.  The  simulator 
decides,  taking  into  account  the  relative  position  of  the  target  and  radar, 
whether  a  return  is  received.   If  a  return  is  received,  the  simulator  out- 
puts a  noisy  version  of  the  true  range,  range  rate  and  direction  cosines. 
If  no  return  is  received  the  simulator  sets  a  flag  in  the  output  data. 
At  long  ranges  the  chances  of  getting  a  return  are  relatively  small  but  as 
the  range  decreases  the  radar  gets  returns  more  and  more  consistently. 

Forming  the  Mathematical  Model 

It  should  be  noted  that  this  problem,  as  sketched  above,  does  not  fall 
directly  into  the  model  which  has  been  assumed  in  the  development  of  the 
technique.  Among  the  parameters  which  have  not  been  given  in  the  descrip- 
tion of  the  problem  are  $,  T,  B   and  even  x   (the  state  space).  This  is 
typical  of  the  way  in  which  a  problem  is  first  encountered.   What  follows 
will  be  a  series  of  engineering  approximations  which  yield  the  mathematical 
model.   This  model  forms  the  basis  for  the  filter  design. 

First  consider  the  stochastic  dynamic  model.  The  dynamic  model  may 
be  viewed  as  specifying  two  features  of  the  problem.   The  first  is  a  pre- 
diction function.   One  asks:   how  would  one  predict  some  future  state  of 
the  system  given  perfect  knowledge  of  the  present  state?  This  question  in 
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fact  helps  to  define  the  concept  of  state.  The  state  of  the  system  (for 
filtering  purposes)  is  that  collection  of  current  attributes  of  the  system 
which  has  a  bearing  on  the  future  of  the  system.   For  the  aircraft  target 
the  assumption  of  straight  and  level  flight  leads  to  an  assignment  of  pos- 
ition and  velocity  as  the  states.  The  prediction  function  is  based  upon 
the  assumption  of  constant  velocity.   Thus  the  components  of  the  state 
vector  are 

X1  =  north  position  (miles) 

Xj  =  north  velocity  (miles/sec) 

jt_  =  east  position  (miles) 

X,   ■  east  velocity  (miles/sec) 

X,   =  down  position  (miles) 

jc,  =  down  velocity  (miles/sec) 
and  the  prediction  function  is  linear  in  the  states  and  of  the  form 
je(k  +  l)p.   .  =  4\r  (k) .  $  is  the  discrete  time  form  of  three  independent 
double  integrators  for  a  sample  time  of  2  seconds. 


*  = 
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0 
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2 
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0 

1 
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0 
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2 

0 

0 

0 

0 

0 

1 

The  second  feature  which  the  model  must  provide  is  a  measure  of  the 
prediction  errors,  or  equivalently  I\  This  implies  that  the  prediction 

errors  are  random  variables  made  up  of  several  normalized  random  variables. 

T 
The  prediction  error  covariance  is  then  Q  =  T  T  »      For  this  problem 

F  was  chosen  under  the  assumption  that  in  each  direction  there  would  be  a 
step-wise  constant  component  of  acceleration  of  random  amplitude  having 
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zero  mean  and  variance  a   .  This  implied  that  T   takes  the  form 


r=  2 


*N 

0 

0 

*N 

0 

0 

0 

aE 

0 

0 

°E 

0 

0 

0 

0 

0 

0 

a, 

The  parameters  o  t  o  and  a  must  be  chosen  to  account  for  turbulence  and 
pilot  maneuvers,  which,  from  the  filter's  point  of  view,  appear  as  random 
accelerations. 

CTN=  .02 

aE  =   .02 
aD=  .0032 
Secondly  consider  the  observation  process.   Having  chosen  the  state 

space  x   it  is  straightforward  to  write  the  functions  h(x) 

2    2    2  % 
t^Of)  =  (xl   +  #3  +  #5)   =  range  (miles) 


(*jX2  +  <3*4  +  j^) 


h2(y)  =  — i — rr — =  range  rate  (miles/sec) 

G^  +  #3  +  x5) 


h3Cr)  =   2  ,   2  7   2,% 
^1  +  x3      x5) 


a  north  direction  cosine 


h4C*) 


,2.   2  ,   2,^ 
^1*3   *5' 


s  east  direction  cosine 


h5(*)  = 


,  2  ,   2  .  ""O 
^1   '3   *5* 


=  down  direction  cosine 


'3  ~5' 

The  noise  variance  was  approximated  from  considerations  related  to  the 
radar  simulator. 
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a  = 


1.5 

0 

0 

0 

0 

0 

4.0  x 

10' 

•6 

0 

0 

0 

0 

0 

1 

.8 

x   10' 

•5 
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0 

0 

0 

0 

3, 

1 

x  10"5 

0 

0 

0 

0 

0 

1 

.5 

x  10 

-4 


Finally,  the  a  priori  estimate  and  its  associated  covariance  must  be 
specified  in  order  to  complete  the  mathematical  model.  The  fact  that  the 
first  radar  return  has  been  received  places  the  target  in  a  certain  volume 
in  physical  space.  The  position  components  of  the  a  priori  estimate  were 
taken  as  the  centoid  of  that  volume  and  the  limits  of  the  volume  were  taken 
as  three  standard  deviations  on  either  side  of  the  centroid.  The  a  priori 
velocity  estimates  were  based  on  the  assumption  that  the  target  was  headed 
directly  toward  the  radar  (in  the  negative  north  direction).  The  magnitude 
of  the  velocity  was  taken  as  that  of  a  Mach  2  target.  The  covariance  of 
each  velocity  estimate  was  assumed  to  be  large  compared  to  the  square  of 
this  velocity.  The  a  priori  estimate  was  taken  as: 

"80.0 

-0.3 
0 
0 
0 

0  | 
The  a  priori  covariance  was  taken  as: 


*U/0)  = 
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P(l/0) 


400 

0 

0 

0 

0 

0 

0 

.09 

0 

0 

0 

0 

0 

0 

400 

0 

0 

0 

0 

0 

0 

.09 

0 

0 

0 

0 

0 

0 

400 

0 

0 

0 

0 

0 

0 

.09 

This  completes  the  process  of  abstracting  the  physical  situation  into 
the  form  of  the  mathematical  model. 

It  should  be  emphasized  that  the  abstraction  process  must  be  carried 
out  for  each  physical  system  that  generates  a  sequence  of  measurements.   If 
the  model  accurately  describes  the  conditions  under  which  the  measurements 
are  made  then  the  filter  can  be  expected  to  yield  estimates  which  are  best 
in  some  sense.   Even  an  accurate  model  and  an  optimum  filter  do  not  assure 
that  the  estimates  will  be  adequate  for  any  particular  purpose. 

The  Algorithm  Parameters 

There  are  three  parameters  that  define  the  iteration  termination  cri- 
teria. They  are  the  probability,  a,  that  a  minimum  cost  is  greater  than  a 
given  threshold,  C.;  the  level  of  statistical  significance,  P ;  and  the  num- 
ber of  significant  digits  used  by  the  computer. 

The  threshold,  C  ,  depends  upon  the  number  of  stages  carried  (which 

la 

determines  the  number  of  degrees  of  freedom)  as  well  as  upon  or.  There  are 
five  degrees  of  freedom  in  the  cost  for  each  stage  carried  since  the  system 
dynamics  (1)  introduce  six  constraints  and  the  observations  (5)  introduce 
five  constraints  and  there  are  but  six  adjustable  parameters  (the  components 
of  the  state  vector)  for  each  stage  carried.  An  or  of  0.05  was  chosen  in 
order  to  have  a  small  but  finite  number  of  cases  where  the  minimum  cost 
was  greater  than  C  . 
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The  likelihood- ratio- test  threshold,  C„,  also  depends  upon  the  num- 
ber  of  stages  carried.   The  number  of  degrees  of  freedom  is  six  for  each 
stage  carried  since  that  is  the  number  of  components  in  the  state  vector. 
A  significance  level  of  0.95  was  arbitrarily  chosen.   The  thresholds  C 

id 

and  C  are  shown  in  Table  II. 
M 

The  filter  was  implemented  on  a  CDC  1604  computer.   This  machine 
carries  about  ten  significant  figures.   Two  successive  iterations  were 
considered  to  be  equivalent  if  all  of  the  components  of  the  estimate  were 
equal  in  the  first  nine  significant  figures. 

The  lumping  criterion  is  based  on  a  comparison  of  the  covariance  of 
the  observation  errors  and  the  covariance  of  the  nonlinear  noise  intro- 
duced by  the  linearization  process.   For  concreteness ,  the  nonlinear 
noise  was  considered  to  be  negligible  when  its  covariance  was  less  than 
that  of  the  observation  errors  by  a  factor  of  ten. 

Target  Tracking  Results 

Three  target  trajectories  were  filtered.   The  results  were  quite 
similar.   The  target  on  which  the  largest  number  of  observations  were  re- 
ceived will  be  described  in  some  detail.  v 

Figures  5  through  9  are  a  graphical  display  of  the  tilter  operation. 
Figures  5  and  6  show  the  true  target  trajectory  projected  on  the  NORTH- 
EAST plane  and  the  NORTH- DOWN  plane.   Superimposed  on  the  true  trajectory 
are  confidence  areas  generated  by  the  filter.   The  boxes  are  used  to  pro- 
vide a  measure  of  the  quality  of  the  estimate.   The  size  and  shape  ot  the 
box  is  computed  from  the  covariance  matrix  of  the  estimate.   If  the  es- 
timation errors  were  Gaussian  with  a  covariance  equal  to  that  computed 
by  the  filter  the  box  would  have  the  following  interpretation.   There  is 
an  ellipse,  centered  about  the  estimate  which  contains  the  true  state 
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TABLE   II 
ITERATION  CONTROL   PARAMETERS   AS   A  FUNCTION  OF   THE  NUMBER 
OF  STAGES   CARRIED  FOR  cr  -.05   AND  3   -    .95 


NUMBER 

OF 

CL 

CM 

STAGES   CARRIED 

1 

11.07 

1.64 

2 

18.31 

5.23 

3 

25.00 

9.39 

4 

31.41 

13.85 

5 

37.65 

18.49 

6 

43.77 

23.02 

7 

49.55 

27.86 

8 

55.76 

32.85 

9 

61.33 

37.80 
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Fig.  5.   Projection  on  NORTH-EAST 
plane  of  true  trajectory 
with  estimates. 


Fig.  6.   Projection  on  NORTH-DOWN 
plane  of  true  trajectory 
with  estimates. 
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Fig.    7.      Observation   (pi)   and  estimation   (+)   errors   in 

NORTH  coordinate. 
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Fig.  8.   Observation  (jQ)  and  estimation  (+)  errors  in 

EAST  coordinate. 
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Fig.  9.   Observation  (3)  and  estimation  (+)  errors  in 

DOWN  coordinate. 
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with  probability  of  0.63  and  whose  boundary  is  a  curve  of  constant  prob- 
ability density.   The  vertices  of  the  box  are  the  extremities  of  the 
major  and  minor  axes  of  that  ellipse. 

Figures  7,  8,  and  9  show  the  estimation  errors  and  the  observation 
errors  in  each  of  the  position  coordinates.   In  addition  the  computed 
standard  deviation  of  the  estimates  is  shown  as  a  solid  curve. 

In  order  to  illustrate  the  ability  of  the  algorithm  to  converge  to 
a  least-squares  estimate  the  cost  is  given  in  Table  III  as  a  function  of 
the  number  of  iterations  and  the  number  of  observations  received.   The 
first  cost  listed  in  each  row  was  evaluated  at  an  estimate  X.   which  has 
not  been  adjusted  for  the  newly  received  observation.   The  estimate  x. 

for  a  (k+l)-stage  minimization  process  is  given  by  (89).   The  first  iter- 

2  1 

ation  yields  x.   by  adjusting  X.    for  the  most  recently  received  observation. 

Only  the  partial  derivatives  associated  with  this  new  observation  are 
evaluated  for  this  step.   This  step  is  comparable  to  the  simple  single- 
stage  linearization  employed  in  [4]  with  such  disappointing  results.   The 
second  row  indicates  the  danger  of  stopping  at  this  point.   Subsequent 
iterations  reevaluate  all  of  the  partial  derivatives  at  the  previous 
estimate.   All  of  the  observations  are  reprocessed  with  estimates  start- 
ing from  the  a  priori  estimate. 

The  cost  after  a  lumping  operation  is  only  the  sum  of  squares  of 
the  residuals  associated  with  stages  still  carried  by  the  filter.   The 
lumping  operation  occurs  in  the  middle  of  second  iteration  because  this 
is  the  first  time  that  the  intermediate  results  needed  to  form  the  new 
a  priori  estimate  of  the  remaining  stage  become  available  again  after  the 
decision  to  lump  has  been  made. 

The  time  required  to  produce  a  least-squares  estimate  is  tabulated 
in  Table  IV.   The  time  indicated  does  not  include  the  time  required  for 
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TABLE  III 

TYPICAL  SEQUENCE  OF  COSTS  AS  A  FUNCTION  OF  NUMBER 

ITERATIONS  AND  NUMBER  OF  OBSERVATIONS  PROCESSED 


OBSERVATION 

C(JPl) 

STAGES 

NUMBER 

i-1 

i-2 

i=3 

i=4 

CARRIED 

1 

367. 

0.57 

0.13 

1 

2 

377.8 

118.5 

2.07 

2.07 

2 

3 

35.4 

11.7 

11.16 

3 

4 

19.0 

12.9 

12.9 

4 

5 

48.7 

20.99 

20.99 

5 

6 

121.6 

27.36 

27.36 

6 

7 

811.9 

38.28 

38.06 

7 

8 

749.9 

101.3 

44.16 

44.16 

8 

9 

48.51 

44.4 

9.49* 

9.49 

2 

10 

12.27 

10.98 

1.51* 

1.51 

1 

*  A  lumping  operation  oc< 

:urred. 
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TABLE  IV 

TYPICAL  COMPUTATIONAL  TIME  REQUIREMENTS  FOR 
THE  CDC  1604  COMPUTER 


OBSER- 
VATION 
NUMBER 

COMPU- 
TATION 
TIME 
(SEC) 

CUMULATIVE 
COMPUTATION 

TIME 

(SEC) 

OBSERVATION 
ARRIVAL 
TIME 
(SIMULATED) 

1 

0.817 

0.817 

0.0 

2 

2.633 

36.633 

34.00 

3 

2.617 

48.617 

46.00 

4 

3.683 

52.300 

48.00 

5 

5.633 

65.633 

60.00 

6 

7.633 

73.266 

62.00 

7 

10.45 

83.716 

66.00 

8 

26.083 

109.800 

78.00 

9 

14 . 900 

114.700 

80.00 

10 

2.100 

116.800 

82.00 

11 

1.367 

118.167 

86.00 

22 

1.467 

134.050 

110.00 

34 

1.300 

148.417 

130.00 

63 

1.483 

196.017 

196.00 
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auxiliary  computations  performed  for  diagnostic  purposes  nor  the  time 
required  to  read  the  data  from  the  magnetic  tape.   It  does  include  all 
computations  inherent  in  the  filter  operation  such  as  evaluating  the 
cost  and  covariance  of  the  nonlinear  noise.   The  cumulative  running  time 
has  been  adjusted  to  reflect  the  fact  that  the  filter  cannot  begin  to 
compute  a  new  estimate  until  the  next  observation  is  available. 

Comparison  between  Table  III  and  Table  IV  yields  the  obvious  fact 
that  the  time  required  to  compute  the  estimate  is  highly  dependent  on 
the  number  of  stages  carried.   From  an  analysis  of  the  computations  in- 
volved in  Figure  3  it  can  be  shown  that  the  computations  increase  as  the 
square  of  the  number  of  stages  times  the  system  order.   The  computation 
time  depends  linearly  on  the  number  of  observations. 

The  operation  of  the  filter  indicated  for  the  tenth  observation  is 
typical  of  all  of  the  remaining  stages  in  both  number  of  iterations  and 
processing  time  required  with  the  exception  of  the  cases  where  the  minimum 
cost  was  greater  than  C  .   This  happened  3  times  out  of  67  observations  on 
the  longest  run.   In  each  case  additional  iterations  involved  only  a  single- 
stage.   Two  or  three  extra  iterations  were  needed  to  satisfy  the  termina- 
tion criterion.   The  largest  relative  change  in  any  component  of  the  es- 
timate decreased  approximately  two  orders  of  magnitude  after  each  itera- 
tion.  The  average  processing  time  for  these  three  cases  was  about  1.9 
seconds. 
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12.   Conclusions 

An  algorithm  has  been  developed  for  the  processing  of  a  sequence  of 
noisy,  nonlinear  observations  made  on  a  dynamic  system  whose  state  is  a 
random  function  of  time.   The  best  estimate  of  the  state  of  the  system 
at  each  observation  time  is  defined  to  be  the  weighted- least- squares  es- 
timate. 

This  estimate  is  computed  by  solving  a  sequence  of  linear  problems 
which  approximate  the  nonlinear  problem  more  and  more  closely.   A  method 
has  been  developed  for  automatically  determining  the  number  of  iterations 
required  to  compute  the  least- squares  estimate  by  the  above  procedure. 

The  computation  of  each  estimate  is  based  on  a  least-squares  fit  on 
only  a  finite  sequence  of  past  observations.   A  method  has  been  developed 
for  determining  the  length  of  this  sequence  of  past  observations.   The  in- 
formation contained  in  the  older  observations  is  carried  forward  in  the 
form  of  an  a  priori  estimate. 

The  radar  tracking  problem  is  an  example  of  the  type  of  problem  which 
falls  within  the  scope  of  this  investigation.   The  algorithm  was  implemented 
on  a  digital  computer  and  used  to  process  a  sequence  of  observations  pro- 
vided by  a  realistic  radar-target  simulator.   The  estimation  errors  were 
generally  within  the  expected  range,  considering  the  randomness  of  the 
dynamic  system  and  the  observation  errors.   The  algorithm  achieved  the 
least-squares  estimate  in  three  or  four  iterations.   The  length  of  the 
sequence  of  observations  on  which  the  least-squares  fit  was  based, rapidly 
settled  to  only  a  single  previous  observation.   The  computational  require- 
ments appear  excessive  when  compared  with  those  associated  with  linear 
observations.   There  are,  however,  no  other  generally  applicable  methods 
when  the  observations  are  nonlinear  and  there  is  little  prior  information 
about  the  state  of  the  system. 
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Appendix  I 

Discussion  of  singularity  of  P    . 

The  covariance  matrix  P     is  assumed  to  be  non-singular  throughout 

o 

the  development,  however;  there  is  a  meaningful  interpretation  for  the 

case  where  P     is  singular.   It  will  be  shown  that  the  filtering  equations 
o 

are  still  valid  in  view  of  this  interpretation  and  that  in  the  one 
instance  where  the  inverse  of  P     is  required  (in  evaluating  C)  the  use  of 
the  pseudo  inverse  is  appropriate. 

This  development  begins  by  defining  a  new  set  of  state  variables,  y, 
so  that  the  errors  in  the  a  priori  estimates  are  uncorrelated. 

y  =  Mx  (l) 

yQ  =  Uxq  (2) 

-1     T 
where  U  is  a  unitary  matrix  such  that  U   =  U  and 

UP  UT  =  D  (3) 

o 

where  D  is  a  diagonal  matrix. 

It  will  be  shown  now  that  D  is  the  covariance  of  the  a  priori 

estimates  in  the  y   states  . 

Cov  iyQ)   5  E[  (y-yo)    (y-yj   ] 

Cov   {y   )  -   E[UCc-jc   )    <x-x   )TUT] 
o  o  o 

Cov    (yQ)  -    UE[  (JC-XQ)    C*-*o)T]UT 

Cov    (yQ)  =   D  (4) 

If  p     is  singular  then  D  has  at  least  one  zero  on  the  diagonal  or, 
o 

to  be  more  specific,  the  rank  of  p     is  equal  to  the  rank  of  D  which  is 

o 

the  number  of  non-zero  elements  along  the  diagonal  of  D.  There  is  no 
loss  in  generality  in  assuming  that  all  of  the  non-zero  elements  of  D  are 
in  the  upper  part  of  the  diagonal .   The  upper  elements  of  y     are 
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conventional  statistical  estimates  of  the  corresponding  elements  of  the 

true  state  y,    having  a  variance  given  by  the  element  in  D.   On  the  other 

hand,  the  lower  elements  of  y     are  precise  or  exact  estimates  of  the 

corresponding  true  state  components  and  have  no  variance.   When  these 

interpretations  are  reflected  back  to  the  x   states  the  meaning  of  a 

singular  p     becomes  clear.  A  singular  P     implies  that  a  certain  number 
o  o 

of  linear  combinations  of  the  states  are  known  exactly. 

It  will  now  be  shown  that  the  filtering  equations,  used  repeatedly 
in  the  minimization  process,  produce  estimates  consistent  with  these 
interpretations.   That  is,  the  estimate  of  those  linear  combinations  of 
the  states  which  were  known  exactly  before  adjustment  for  observed  data 
are  not  affected  by  the  adjustment  process.   Further  the  covariance 
matrix,  p     of  the  adjusted  estimates  reflects  the  fact  that  these  linear 
combinations  are  known  exactly.   This  demonstration  will  be  carried  out 
using  the  y   state  space.  The  filter  equations  are  transformed  to  operate 
on  the  y   coordinates. 

Consider  first 

substituting  y   for  x   in  (5)  yields 

vtyl  -  uTi/0  +  PjFwpji*  t  W'1  O  '  #uTi/0]  »  (6) 

pre-multiplying  both  sides  of  (6)  yields 

yl  "  yo  +  WjFvPjF  +  ^l"1  O   -  H\?yQ-\    ,  (7) 

substituting  U  DU  for  P     from  (3) 

o 

y    =  yQ  +  DUtfT[/AJTDtWT  +  -ft]"1  \z  -  HlPy  ]    ,  (8) 

T 
and   finally  making  a  change  of  variables  f(  =  H\J     yields   the   filtering 


equation, 


yl  =  yo  "*"  vFWK*  +  ^^l"1  O  "  ^0]    •  (9) 
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in  the  y  states.     Since  the  lower  elements  of  D  are  zero  it  is  clear  that 
there   is  no  change  in  these  components  of  the  adjusted  estimate,  y    , 
after  application  of   (9). 

Now  consider  the  covariance  equation 

*1  "  Po  '  *fV*f  +  *   ]"'  ^o 
P.  *    UTDU  -   UTDU//[#UTDUtfT  +  /?]"WdU 

Px  =  UT[D  -  IXT  [KDK1  +  *]" JH>]U  (10) 

T 
multiplying  in  front  by  U  and  in  back  by  U 

Uf^U1  =  D  -  EKT  [KW  +  /?]" Vd  (11) 

Thus  p     reflects  the  fact  that  those  linear  combinations  of  x   which 

were  known  exactly  are  still  known  exactly. 

In  the  expression  for  the  cost  consider  only  the  first  term, 

\\x     -  Xi  „  »  where  W.  was  assumed  to  be  the  inverse  of  P     if  it  existed, 
o    1"W1         1  o 

Substituting  in  the  y   states  yields 

T 
Defining  Q  =   UWU  and  substituting  above  yields 

Define^  =  (y.    -  y   )  and  partition  ~y   into  two  subvectors 


2/  ■  - 


H 


where  the  lower  subvector  'y  t   =  0  from  (9) 
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Let  Q   be  partitioned  as 


[3  ;] 


where  D  is  the  upper,  non-zero,  part  of  D.   Thus  sum  of  the  residuals 

can  be  expressed  as 

2    in-  n  2 


u0-*iiiw  -  K  ii -i 


Du 


The  blank  submatrices  in  Q   above  are  immaterial  since  they  will  be 
multiplied  by  y .   =  0.   Arbitrarily  assigning  zero  submatrices  to  the 
blanks  implies  that  those  residuals  which  are  known  to  be  zero  are  given 
zero  weight.   This  leads  to 


Q  = 


d-1  01 

U 
0   0 


uw.uT 


T 
and  preraultiplying  by  U  and  postmultiplying  by  U  yields 


w^uT[!ul  ;]u 


but  this  is  just  an  expression  for  the  pseudo  inverse  of  P    . 

o 

Thus  if  the  pseudo  inverse  of  P     is  used  in  the  definition  of  the 
cost  there  will  be  no  weighting  of  the  residuals  in  certain  linear 
combinations  of  the  states.   On  the  other  hand,  if  the  x,    is  always 
computed  using  (5)  or  one  of  its  derivatives  then  these  particular 
residuals  will  always  be  zero. 

During  the  discussion  of  the  minimization  algorithm  the  non- 
singularity  p     was  an  important  assumption.   The  discussion  is  valid  for 


81 


the  case  of  p     singular  in  the  sense  that  a  new  minimization  problem  can 

be  defined  in  terms  of  y   ,  taking  y .=  y .    .  The  discussion  then  implies 

o    1 
that  the  change  in  the  estimate  of  y     has  a  positive  component  in  the 

direction  of  the  negative  gradient  of  C  with  respect  y  .     when  these 

conclusions  are  reflected  back  to  the  x   state  space  it  can  be  seen  that 

the  change  in  the  estimate  is  related  to  the  projection  of  the  gradient 

of  C  into  that  subspace  of  the  state  space  about  which  there  is  some 

uncertainty  and  for  which  an  adjustment  in  the  estimate  is  meaningful. 
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Appendix  II 

Consider  a  random  variable 

V  m  XTAX  (1) 

where  A  is  a  symmetric  matrix  and  X   is  a  Gaussian  random  variable 
with  mean  zero  and  covariance  P   .   It  is  desired  to  find  an  expression 
for  the  first  and  second  moment  of  the  random  variable  u .  The 
development  begins  by  making  a  change  of  variables 

x  -  Blty  (2) 

T 
where  B  is  a  decomposition  of  P   such  that  P   ■  BB  ,  U  is  a  unitary  matrix 

T 
as  yet  unspecified,  such  that  UU  =  I  and  y is  a  random  vector  of  zero 

mean  and  identity  covariance.   The  random  variable  V    is  expressed  in 

terms  of  y   by  substituting  (2)  in  (1). 

"  ■  J/TUTBTABU>  (3) 


Now  let  U  be  chosen  so  that 


UTBTABU  -  D  (4) 


where  D  is  a  diagonal  matrix.   Substituting  (4)  in  (3)  yields 

v   »  /ty  (5) 

or  v  can  be  expressed  in  terms  of  the  components  of  y 

v  m  Ldvl  (6) 

£  i-  l 

To  evaluate  the  first  moment  of  v    the  order  of  summation  and 
expectation  is  interchanged. 

E[u]  =    E[£  d  yh 
i 

E[y]  -  ^ECi^] 
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The  components  of  y   all  have  unit  variance. 

E|"u]  »  Z  d  (7) 

1 

Expressing  (7)  In  matrix  form  yields 

E[u]  =  trD  (7') 

2 
The  second  moment  is  evaluated  by  expanding  v     in  terms  of  the 

components  of  y. 

Eru2]  =  Era  <v2><r  *.ybl 
i  j    J  J 

Eru2]  =  E[T  d2y\  +  T  d  d  y2y]-] 

Er,2]  ,  -  d2^*]  +  J/iV^J1  (8) 

The  first  term  is  evaluated  by  recalling  that  the  fourth  central 

moment  of  a  unit-variance  Gaussian  random  variable  is  3.  Each 

element  of  the  second  term  can  be  factored  due  to  the  independence 

of  the  components  of  y, 

E[u2]  =  3T  d2  +  V.  d  d  E[y2A   E[y2'] 
i   1  ys   1  J   1     J 

EP2]  =  37  d2  +  T.   d.d, 
i  i  Wj  i  j 

Eru2]  =  2T  d2  +  (T  d  )2 
i      1 


Substituting  (7)  above  yields 


E[u2]  =  2£  d2  +  (Eru])2  (9) 

i  X 


This  implies  that  the  variance  of  u  is 


Varf"u]  =  2T   d2  (10) 

1 


or  in  matrix  form 
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Var[u]  =  2trrD2].  (10») 

The  results,  (7)  and  (10),  are  expressed  in  terms  of  the  parameters 
of  the  original  problem.   Substituting  (4)  in  (7')  yields 

E\v~\   =  tr[UTBTA  B  U  ]  . 
Taking  advantage  of  the  fact  that  trfAB]  =  tr[BA]  yields 

E[u]  =  tr[A  B  U  UTBT] 
and  cancelling  Che  unitary  matrices  yields 

Eru]  =  tr[AP]  .  (11) 

The  development  for  the  variance  procedes  along  analogous  lines. 
Var[u]  =  2trfUTBTA  B  U  UTBTA  B  U  ] 
Var[u]  =  tr[A  B  BTA  B  BT] 

Var[u]  =  tr[APAP]  (12) 

It  is  possible  to  consider  the  covariance  of  two  random  variables 

as  follows 

T 

V   s  x   A  x  (1) 

and  a  second  random  variable 

V    =  xTA'x  d') 

The  means  of  both  of  these  random  variables  are  known  from  (11)  and  an 
expession  will  be  developed  for  the  expected  value  of  the  product.  After 
making  the  same  change  of  variables  as  before,  the  expected  value  of  the 
product  can  be  expressed  as 

E[u'u]  =  E[^TC  y  yTD  y~\  (13) 


where 


This  is  equivalent  to 


C*   UTBTA'B  U  (14) 


E[u'u]  ==  tr(C  E[y  yTL  y  yT])  (15) 
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Examining  only  the  elements  of  the  matrix  inside  the  expected  value 

T 
operator,  it  is  noted  that  the  middle  term,  y   D  y,   is  a  scalar. 

J/TD  V  *  L  d.y2.  (16) 

i  *  x 
Thus,  the  elements  of  the  matrix  are 

■rVj  J  dk^]  (17) 

This  expected  value  is  zero  for  i^j  and  for  i"j  it  can  be  written  as 

M s  vb  -  3di  +£\ 

E[ii 2  **&  *  2di +  L  dk  <18) 

k  k 

Thus,    the  expected  value   is  a  diagonal  matrix  of  the   following  form 

E[W/TD^T]  -  2D  +   (2  d   )*I 

k     K 

Substituting  in  (15)  yields 

E[u  u]  -  tr  C[2D  +  (S  4)1  3 

k  k 

which  can  further  be  simplified  to 

E[U  »3  -  2tr  [cd3  +  [L  d  3trC  (19) 

k  K 

Considering  only  the  factors  of  the  last  term,  it  is  noted  that 

Ed  -  e[w3  (21) 

k    K 

and  that 

tr  C  -  tr[UTB  A  B  u3 

tr  C  ■  tr[AP3 

tr  C  -  E[l>  3  (22) 

Now  the  first  term  can  be  identified  with  the  covariance  from  the  general 
expression 
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Cov  [r  or  ]  -  E[>y2]  -   EC^]  Ef^]  (23) 

Cov[u'w]  -  2tr[c  D] 
Cov[u'u]  -  2tr[uTBTA'B  U  DJ 
Cov[y'u]  -  2tr[A'B  U  D  uV] 
Cov  [Vv]   -  2tr[A'B  U  UTBTA  B  U  uV] 

Cov[u'u]  -  2tr[A'P  A?]  (24) 

The  useful  results  of  this  analysis  are  (11)  and  (24)  since  (12)  is 
only  a  special  case  of  (24).   Since  these  results  will  be  used  as  a 
guide  even  when  the  random  variable  x   is  not  known  to  be  Gaussian  it  is 
worth  reviewing  just  where  the  assumptions  were  used.   The  factor  2 
which  appears  in  (24)  comes  directly  from  Gaussian  assumption  (i.e.,  the 
fourth  central  moment  is  3  times  the  second  central  moment  squared).   A 
second  result  of  the  Gaussian  assumption  is  that  the  new  random  variables 
y   are  statistically  independent  (used  in  (8)  and  (18)).   The  variables 
y   are  uncorrelated  (since  D  is  diagonal)  but  only  in  the  Gaussian  case 
does  this  imply  independence.   The  effect  of  this  assumption  is  diffi- 
cult to  evaluate.   On  the  other  hand  (8)  does  not  depend  upon  the 
Gaussian  assumption. 
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Appendix  III 


Program  Listings 
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